Why might we see this sort of “wanting” arise in tandem with the ability to solve long-horizon problems and perform long-horizon tasks?
Because these “long-horizon” tasks involve maneuvering the complicated real world into particular tricky outcome-states, despite whatever surprises and unknown-unknowns and obstacles it encounters along the way. Succeeding at such problems just seems pretty likely to involve skill at figuring out what the world is, figuring out how to navigate it, and figuring out how to surmount obstacles and then reorient in some stable direction.
I think maybe I buy it for planning tasks, which entail responding to surprising events that the world throws at you and getting back on track towards a goal. I’m not sure that I buy it for “design” tasks, like designing a rocket ship or a nanofactory. Those tasks seem like they can maybe be solved in one sweep, the way current LLMs (often) answer my question in one single forward pass through the network.
That makes me think of superhuman engineers that are given a design spec, and produce a design. And then human engineers look over the design and maybe build prototypes, and realize what was missing from the spec, and then go back and improve the spec to give to the AI engineer, just as I sometimes ask an LLM a question, and I realize from the answer that my question was not specific enough.
With that story of how we apply AI tech, here’s some adverse selection for designs that when built, trick the humans into thinking that they got the output they wanted, when actually they didn’t. But there’s not strong optimization pressure for that set of outcomes. The AI is just engineering to a prompt / spec, it isn’t
My model of Eliezer, at least as of the 2021 MIRI dialogs, thinks that this kind of system, that can do superhuman engineering in one forward pass, without a bunch of reflection and exploratory design (eg trying some idea, seeing how it fails, in your mind or in reality, iterating), is implausible, or at least not the first and most natural way to solve those problems on the tech tree. Indeed, real Eliezer almost says that outright at 18:05 here.
That model says that you need those S2-style reflection and iteration faculties to do engineering, and that employing those faculties is an application of long term planning. That is, tinkering with a design, has the same fundamental structure of “reality throwing wrenches into your plans and your pivoting to get the result you wanted anyway”.
However, the more sophisticated the reasoning of LLMs get, the less plausible it seems that you need reflection, etc. in order to do superhuman engineering work. A big enough neural net with enough training data can grow into a something like an S1 which is capable enough to do the work that humans generally require an S2 for.
I think maybe I buy it for planning tasks, which entail responding to surprising events that the world throws at you and getting back on track towards a goal. I’m not sure that I buy it for “design” tasks, like designing a rocket ship or a nanofactory. Those tasks seem like they can maybe be solved in one sweep, the way current LLMs (often) answer my question in one single forward pass through the network.
That makes me think of superhuman engineers that are given a design spec, and produce a design. And then human engineers look over the design and maybe build prototypes, and realize what was missing from the spec, and then go back and improve the spec to give to the AI engineer, just as I sometimes ask an LLM a question, and I realize from the answer that my question was not specific enough.
With that story of how we apply AI tech, here’s some adverse selection for designs that when built, trick the humans into thinking that they got the output they wanted, when actually they didn’t. But there’s not strong optimization pressure for that set of outcomes. The AI is just engineering to a prompt / spec, it isn’t
My model of Eliezer, at least as of the 2021 MIRI dialogs, thinks that this kind of system, that can do superhuman engineering in one forward pass, without a bunch of reflection and exploratory design (eg trying some idea, seeing how it fails, in your mind or in reality, iterating), is implausible, or at least not the first and most natural way to solve those problems on the tech tree. Indeed, real Eliezer almost says that outright at 18:05 here.
That model says that you need those S2-style reflection and iteration faculties to do engineering, and that employing those faculties is an application of long term planning. That is, tinkering with a design, has the same fundamental structure of “reality throwing wrenches into your plans and your pivoting to get the result you wanted anyway”.
However, the more sophisticated the reasoning of LLMs get, the less plausible it seems that you need reflection, etc. in order to do superhuman engineering work. A big enough neural net with enough training data can grow into a something like an S1 which is capable enough to do the work that humans generally require an S2 for.