On AI diplomacy, 1. It is a useful model benchmark but 2. Is not new, since we already had human level full press diplo harnesses (CICERO) (2022). It would be useful to compare these new harnesses to the more targeted system.
On AI diplomacy, 1. It is a useful model benchmark but 2. Is not new, since we already had human level full press diplo harnesses (CICERO) (2022). It would be useful to compare these new harnesses to the more targeted system.