Max H comments on “Aligned” foundation models don’t imply aligned systems

Max H 14 Apr 2023 13:27 UTC
1 point
−1
I agree the zeitgeist has changed, but I think some people (or at least Nate and Eliezer in particular), have always been more concerned about more agent-like systems, along the lines of Mu Zero. For example, in the 2021 MIRI conversations here:
I do not quite think that gradient descent on Stack More Layers alone—as used by OpenAI for GPT-3, say, and as opposed to Deepmind which builds more complex artifacts like Mu Zero or AlphaFold 2 - is liable to be the first path taken to AGI.
and here:
Okay. I don’t think the pure-text versions of GPT-5 are being very good at designing nanosystems while Living Zero is ending the world.
and here:
There may be different cognitive technology that could follow a path like that. Gradient descent follows a path a bit relatively more in that direction along that axis—providing that you deal in systems that are giant layer cakes of transformers and that’s your whole input-output relationship; matters are different if we’re talking about Mu Zero instead of GPT-3.
Deep Deceptiveness is more recent, but it’s another example of a carefully non-specific argument that doesn’t factor through any current DL-paradigm methods, and is consistent with the kind of thing Nate and Eliezer have always been saying.
I think recent developments with LLMs have caused some other people to update towards LLMs alone being dangerous, which might be true, but if so it doesn’t imply that more complex systems are not even more dangerous.