Charlie Steiner comments on Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Charlie Steiner 21 Dec 2023 0:58 UTC
LW: 2 AF: 1
0
AF
Good paper—even if it shows that the problem is hard!
Sounds like it might be worth it to me to spend time understanding the “confidence loss” to figure out what’s going on. There’s an obvious intuitive parallel to a human student going “Ah yes, I know what you’re doing”—rounding off the teacher to the nearest concept the student is confident in. But it’s not clear how good an intuition pump that is.
I agree with Roger that active learning seems super useful (especially for meta-preferences, my typical hobby-horse). It seems a lot easier for the AI to learn about how the teacher generalizes (and how it wants to generalize) if it gets to do experiments, rather than having to wait for natural evidence to crop up in the data. This definitely gets me excited about brainstorming experiments we could do along these lines in the near term.
Though there’s likely an uncanny valley here: if the teacher is making systematic mistakes that you’re wholly relying on the student’s inductive biases to correct, then the student being better at learning the teacher’s generalization behavior will make its performance worse! Maybe you get out of the valley on the other side when the student learns a model of how much to use its inductive biases that approximates what the teacher wants it to do.