Julian Stastny comments on Making deals with early schemers

Julian Stastny 23 Jun 2025 17:15 UTC
LW: 1 AF: 1
0
AF
[Unimportant side-note] We did mention this (but not discuss extensively) in the bullet about convergence, thanks to your earlier google doc comment :)
We could also try to deliberately change major elements of training (e.g. data used) between training runs to reduce the chance that different generations of misaligned AIs have the same goals.