I think this can also be used to make a smarter version of PrudentBot. If I remember right, PrudentBot is defined as
To me, the part about choosing DefectBot in particular to compare with seems a little hacked. What you’re really interested in is seeing if it will defect against you, even if you defect against them. Thus, I propose SharkBot, which I label
I’m pretty sure this cooperates with itself and FairBot, but defects against CooperateBot and a larger share of stupid agents than PrudentBot does.
Moreover, the chatbot is typically not even trained to predict the user dialogue; in training there is usually a mask which zeroes out any gradients coming from those tokens.