Tom Davidson comments on Making deals with early schemers

Tom Davidson 27 Jun 2025 9:02 UTC
LW: 4 AF: 3
0
AF
hmm, i think there could be other structural things.
- Maybe you shift around different stages of post-training
- Maybe you realise that randomising means you get the same values each time, so you purposefully over-sample certain types of environments or pre-training data earlier on. E.g. first pre-training heavily on math, then heavily on fiction; then reversing that for the next gen. Or first doing RL on lots of SWE envs, then doing general computer-use envs; then reversing the order for the next gen.
- You could potentially do experiments to determine what kind of features of the data/envs influences the values of the AI and then make sure that those specific features differ in the early stages of fine-tuning for different training runs.