Roman Leventov comments on Internal independent review for language model agent alignment

Roman Leventov 12 Jul 2023 11:33 UTC
3 points
0
I’m not talking about a hypothetical slam-dunk future idea. I don’t think we’ll get one, because the AGI we’re developing is complex. There will be no certain proofs of alignment. I’m talking about the set of ideas in this post.
As I said, ideas about LLM (and LMA) alignment are cheap. We can generate lots of them: special training data sequencing and curation (aka “raise AI like a child”), feedback during pre-training, fine-tuning or RL after pre-training, debate, internal review, etc. The question is how many of these ideas should be implemented in production pipeline: 5? 50? All ideas that LW authors could possibly come up with? The problem is, that each of these “ideas” should be supported in production, possibly by the entire team of people, as well as incur compute cost and higher latency (that worsens the user experience). Also, who should implement these ideas? All leading labs that develop SoTA LMAs? Open-source LMA developers, too?
And yes, I think it’s a priori hard and perhaps often impossible to judge how will this or that LMA alignment idea work at scale.