jacquesthibs comments on Disentangling inner alignment failures

jacquesthibs 17 Oct 2022 12:04 UTC
1 point
0
Fantastic post!
But for now, this doesn’t lead to permanent improvements to the language model.
LMs like RETRO or WebGPT seem to be an answer to this. They have external database that essentially permanently improves the model’s capabilities. If you have a sufficiently powerful LM that can surf the web and extract new relevant info, you can get continuous capability gains without needing to actually retrain the model with gradient descent. I would personally call this permanent.
Note that the database doesn’t only need to be some curated or external source, it can be the model’s actual high-quality prompts and interactions with users that end up being stored and then reused when it makes sense.