TurnTrout comments on A shot at the diamond-alignment problem

TurnTrout 10 Oct 2022 21:34 UTC
LW: 5 AF: 4
2
AF
Good question, which I should probably have clarified in the essay. On a similar compute budget, could e.g. an actor-critic in-sim approach reach superintelligence even more quickly? Yeah, probably. The point of this story isn’t that this (i.e. SSL+IL+PG RL) is the optimal alignment configuration along (competitiveness, alignability-to-diamonds), but rather I claim that if this story goes through at all, it throws a rock through how we should be thinking about alignment; if this story goes through, one of the simplest, “dumbest”, most quickly dismissed ideas (reward agent for good event) can work just fine to superhuman and beyond, in a predictable-to-us way which we can learn more about by looking at current ML.