Aryeh Englander comments on [Linkpost] Treacherous turns in the wild

Aryeh Englander 27 Apr 2021 17:27 UTC
LW: 8 AF: 2
0
AF
I don’t think this is quite an example of a treacherous turn, but this still looks relevant:
Lewis et al., Deal or no deal? end-to-end learning for negotiation dialogues (2017):
Analysing the performance of our agents, we find evidence of sophisticated negotiation strategies. For example, we find instances of the model feigning interest in a valueless issue, so that it can later ‘compromise’ by conceding it. Deceit is a complex skill that requires hypothesising the other agent’s beliefs, and is learnt relatively late in child development (Talwar and Lee, 2002). Our agents have learnt to deceive without any explicit human design, simply by trying to achieve their goals.
(I found this reference cited in Kenton et al., Alignment of Language Agents (2021).)
- Mark Xu 27 Apr 2021 18:01 UTC
  2 points
  0
  AF Parent
  This is a cool example, thanks!