Eli Tyre comments on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

Eli Tyre 25 Nov 2023 19:37 UTC
6 points
0
That makes me think of superhuman engineers that are given a design spec, and produce a design. And then human engineers look over the design and maybe build prototypes, and realize what was missing from the spec, and then go back and improve the spec to give to the AI engineer, just as I sometimes ask an LLM a question, and I realize from the answer that my question was not specific enough.

With that story of how we apply AI tech, here’s some adverse selection for designs that when built, trick the humans into thinking that they got the output they wanted, when actually they didn’t. But there’s not strong optimization pressure for that set of outcomes. The AI is just engineering to a prompt / spec, it isn’t

My model of Eliezer, at least as of the 2021 MIRI dialogs, thinks that this kind of system, that can do superhuman engineering in one forward pass, without a bunch of reflection and exploratory design (eg trying some idea, seeing how it fails, in your mind or in reality, iterating), is implausible, or at least not the first and most natural way to solve those problems on the tech tree. Indeed, real Eliezer almost says that outright at 18:05 here.

That model says that you need those S2-style reflection and iteration faculties to do engineering, and that employing those faculties is an application of long term planning. That is, tinkering with a design, has the same fundamental structure of “reality throwing wrenches into your plans and your pivoting to get the result you wanted anyway”.

However, the more sophisticated the reasoning of LLMs get, the less plausible it seems that you need reflection, etc. in order to do superhuman engineering work. A big enough neural net with enough training data can grow into a something like an S1 which is capable enough to do the work that humans generally require an S2 for.