Oliver Sourbut comments on AI as a science, and three obstacles to alignment strategies

Oliver Sourbut 26 Oct 2023 18:03 UTC
6 points
1
Right. And in the context of these explainer videos, the particular evolution described has the properties which make it near-equivalent to SGD, I’d say?

SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired.

Hmmm, this strikes me as much too strong (especially ‘this lets you directly mold the circuits’).

Remember also that with RLHF, we’re learning a reward model which is something like the more-hardcoded bits of brain-stuff, which is in turn providing updates to the actually-acting artefact, which is something like the more-flexibly-learned bits of brain-stuff.

I also think there’s a fair alternative analogy to be drawn like
- evolution of genome (including mostly-hard-coded brain-stuff) ~ SGD (perhaps +PBT) of NN weights
- within-lifetime-learning of organism ~ in-context something-something of NN
(this is one analogy I commonly drew before RLHF came along.)

So, look, the analogies are loose, but they aren’t baseless.