As someone with limited knowledge of AI or alignment, I found this post accessible. There were times when I thought I knew vaguely what Nate meant but would not be able to explain it so I’m recording my confusions here to come back to when I’ve read up more. (If anyone wants to answer any of these r/NoStupidQuestions questions, that would be very helpful too).
“Your first problem is that the recent capabilities gains made by the AGI might not have come from gradient descent”. This is something that comes up in response to a few of the plans. Is the idea that during training, for advanced enough AIs capabilities gains come from gradient descent and also through processing input / interacting with the world. Or is the second part only after it has finished training. What does that concretely look like in ML?
Is a lot of the disagreement about these plans just because of others finding the idea of a “sharp left turn” more unlikely than Nate or is there more agreement about that idea but the disagreement is about what proposals might give us a shot at solving it?
What might an ambitious interpretability agenda focused on the sharp left turn and the generalization problem look like besides just trying harder at interpretability?
Another explanation of the “sharp left turn” would also be really helpful to me. At the moment, it feels like I can only explain why that happens by using analogies to humans/apes rather than being able to give a clear explanation for why we should expect that by default, using ML/alignment language.
Interesting bet on AI progress (with actual money) made in 1968:
1968 – Scottish chess champion David Levy makes a 500 pound bet with AI pioneers John McCarthy and Donald Michie that no computer program would win a chess match against him within 10 years.
1978 – David Levy wins the bet made 10 years earlier, defeating Chess 4.7 in a six-game match by a score of 4½–1½. The computer’s victory in game four is the first defeat of a human master in a tournament
In 1973, Levy wrote:
After winning the bet:
So seems like he very much underestimated progress in chess despite winning the original bet.
https://en.wikipedia.org/wiki/David_Levy_(chess_player)