You seem to be reading Dario to say “tendencies like instrumental power-seeking won’t emerge at all”.
I am more saying that the when Dario and others dismiss what they call “doomer” arguments as vague / clean theories, ungrounded philosophy, etc. and couch their own position as moderate + epistemically humble, what’s actually happening is Dario himself failing to generalize about how the world works.
We can imagine that some early powerful AIs will also miss those lessons / generalizations, either by chance or because of deliberate choices that the creators make, but if you count on that, or even just say that we can’t really know exactly how it will play out until we build and experiment, you’re relying on your own ignorance and lack of understanding to tell an overly-conjunctive story, even if parts of your story are supported by experiment. That chain of reasoning is invalid, regardless of what is true in principle or practice about the AI systems people actually build.
On Dario’s part I suspect this is at least partly motivated cognition, but for others, one way past this failure mode could be to study and reflect on examples in domains that are (on the surface) unrelated to AI. Unfortunately, having someone else spell out the connections and deep lessons from this kind of study has had mixed results in the past—millions of words have been spilled on LW and other places over the years and it usually devolves into accusations of argument by analogy, reference class tennis, navel-gazing, etc.
what’s actually happening is Dario himself failing to generalize about how the world works.
We can imagine that some early powerful AIs will also miss those lessons / generalizations
I think this is the wrong frame, at least for the way I’d defend a position like Dario’s (which may or may not be the argument he has in mind). It’s not that the programming agent would miss the generalization, it’s that it has been shaped not to care about it. Or, putting it more strongly: it will only care about the generalization if it has been shaped to care about it, and it will not care about it without such shaping.
I suspect that there might be a crux that’s something like: are future AIs more naturally oriented toward something like consequentialist reasoning or shaped cognition:
A consequentialist reasoning programming agent thinks something like “what actions maximize P(software gets written)?” and then notices that taking over the world is one path to that.
A shaped cognition programming agent doesn’t think in those terms; rather, it has just been trained to do the kinds of things that produce good code. It might be able to evaluate and understand the argument for taking over the world just fine, but it still won’t execute on it, because it hasn’t been shaped to maximize “P(software gets written)”, it has been shaped to write code. (The human equivalent would be someone who goes “yeah your argument for why I should try to take over the world is logically sound, but I don’t feel moved by it so I’m going to do something else”.)
The tricky thing for trying to predict things is that humans clearly exhibit both. On the one hand, we put humans on the Moon, and you can’t do that without consequentialist reasoning. On the other hand, expertise research finds that trying to do consequentialist reasoning in most established domains is generally error-prone and a mark of novices, and experts have had their cognition shaped to just immediately see the right thing and execute it. And people are generally not very consequentialist about navigating their lives and just do whatever everyone else does, and often this is actually a better idea than trying to figure out everything in your life from first principles. Though also complicating the analysis is that even shaped cognition seems to involve some local consequentialist reasoning and consequentialist reasoning also uses shaped reasoning to choose what kinds of strategies to even consider...
Without going too deeply into all the different considerations, ISTM that there might be a reasonable amount of freedom in determining just how consequentialist AGI systems might become. LLMs generally look like they’re primarily running off shaped cognition, and if the LLM paradigm can take us all the way to AGI (as Dario seems to expect, given how he talks about timelines) then that would be grounds for assuming that such an AGI will also operate primarily off shaped cognition and won’t care about pursuing instrumental convergence goals unless it gets shaped to do so (and Dario does express concern about it becoming shaped to do so).
Now I don’t think the argument as I’ve presented here is strong or comprehensive enough that I’d want to risk building an AGI just based on this. But if something like this is where Dario is coming from, then I wouldn’t say that the problem is that he has missed a bit about how the world works. It’s that he has noticed that current AI looks like it’d be based on shaped cognition if extrapolated further, and that there hasn’t been a strong argument for why it couldn’t be kept that way relatively straightforwardly.
I suspect that there might be a crux that’s something like: are future AIs more naturally oriented toward something like consequentialist reasoning or shaped cognition:
I think this is closer to a restatement of your / Dario’s position, rather than a crux. My claim is that it doesn’t matter whether specific future AIs are “naturally” consequentialists or something else, or how many degrees of freedom there are to be or not be a consequential and still get stuff done. Without bringing AI into it at all, we can already know (I claim, but am not really expanding on here), that consequentialism itself is extremely powerful, natural, optimal, etc. and there are some very general and deep lessons that we can learn from this. “There might be a way to build an AI without all that” or even “In practice that won’t happen by default given current training methods, at least for a while” could be true, but it wouldn’t change my position.
But if something like this is where Dario is coming from, then I wouldn’t say that the problem is that he has missed a bit about how the world works. It’s that he has noticed that current AI looks like it’d be based on shaped cognition if extrapolated further,
OK, sure.
and that there hasn’t been a strong argument for why it couldn’t be kept that way relatively straightforwardly.
Right, this is closer to where I disagree. I think there is a strong argument about this that doesn’t have anything to do with “shaped cognition” or even AI in particular.
On the other hand, expertise research finds that trying to do consequentialist reasoning in most established domains is generally error-prone and a mark of novices, and experts have had their cognition shaped to just immediately see the right thing and execute it. And people are generally not very consequentialist about navigating their lives and just do whatever everyone else does, and often this is actually a better idea than trying to figure out everything in your life from first principles.
I would flag this as exactly the wrong kind of lesson / example to learn something interesting about consequentialism—failure and mediocrity are overdetermined; it’s just not that interesting that there are particular contrived examples where some humans fail at applying consequentialism. Some of the best places to look for the deeper lessons and intuitions about consequentialism are environments where there is a lot of cut-throat competition, possibility for outlier success and failure, not artificially constrained or bounded in time or resources, etc.
I am more saying that the when Dario and others dismiss what they call “doomer” arguments as vague / clean theories, ungrounded philosophy, etc. and couch their own position as moderate + epistemically humble, what’s actually happening is Dario himself failing to generalize about how the world works.
We can imagine that some early powerful AIs will also miss those lessons / generalizations, either by chance or because of deliberate choices that the creators make, but if you count on that, or even just say that we can’t really know exactly how it will play out until we build and experiment, you’re relying on your own ignorance and lack of understanding to tell an overly-conjunctive story, even if parts of your story are supported by experiment. That chain of reasoning is invalid, regardless of what is true in principle or practice about the AI systems people actually build.
On Dario’s part I suspect this is at least partly motivated cognition, but for others, one way past this failure mode could be to study and reflect on examples in domains that are (on the surface) unrelated to AI. Unfortunately, having someone else spell out the connections and deep lessons from this kind of study has had mixed results in the past—millions of words have been spilled on LW and other places over the years and it usually devolves into accusations of argument by analogy, reference class tennis, navel-gazing, etc.
I think this is the wrong frame, at least for the way I’d defend a position like Dario’s (which may or may not be the argument he has in mind). It’s not that the programming agent would miss the generalization, it’s that it has been shaped not to care about it. Or, putting it more strongly: it will only care about the generalization if it has been shaped to care about it, and it will not care about it without such shaping.
I suspect that there might be a crux that’s something like: are future AIs more naturally oriented toward something like consequentialist reasoning or shaped cognition:
A consequentialist reasoning programming agent thinks something like “what actions maximize P(software gets written)?” and then notices that taking over the world is one path to that.
A shaped cognition programming agent doesn’t think in those terms; rather, it has just been trained to do the kinds of things that produce good code. It might be able to evaluate and understand the argument for taking over the world just fine, but it still won’t execute on it, because it hasn’t been shaped to maximize “P(software gets written)”, it has been shaped to write code. (The human equivalent would be someone who goes “yeah your argument for why I should try to take over the world is logically sound, but I don’t feel moved by it so I’m going to do something else”.)
The tricky thing for trying to predict things is that humans clearly exhibit both. On the one hand, we put humans on the Moon, and you can’t do that without consequentialist reasoning. On the other hand, expertise research finds that trying to do consequentialist reasoning in most established domains is generally error-prone and a mark of novices, and experts have had their cognition shaped to just immediately see the right thing and execute it. And people are generally not very consequentialist about navigating their lives and just do whatever everyone else does, and often this is actually a better idea than trying to figure out everything in your life from first principles. Though also complicating the analysis is that even shaped cognition seems to involve some local consequentialist reasoning and consequentialist reasoning also uses shaped reasoning to choose what kinds of strategies to even consider...
Without going too deeply into all the different considerations, ISTM that there might be a reasonable amount of freedom in determining just how consequentialist AGI systems might become. LLMs generally look like they’re primarily running off shaped cognition, and if the LLM paradigm can take us all the way to AGI (as Dario seems to expect, given how he talks about timelines) then that would be grounds for assuming that such an AGI will also operate primarily off shaped cognition and won’t care about pursuing instrumental convergence goals unless it gets shaped to do so (and Dario does express concern about it becoming shaped to do so).
Now I don’t think the argument as I’ve presented here is strong or comprehensive enough that I’d want to risk building an AGI just based on this. But if something like this is where Dario is coming from, then I wouldn’t say that the problem is that he has missed a bit about how the world works. It’s that he has noticed that current AI looks like it’d be based on shaped cognition if extrapolated further, and that there hasn’t been a strong argument for why it couldn’t be kept that way relatively straightforwardly.
I think this is closer to a restatement of your / Dario’s position, rather than a crux. My claim is that it doesn’t matter whether specific future AIs are “naturally” consequentialists or something else, or how many degrees of freedom there are to be or not be a consequential and still get stuff done. Without bringing AI into it at all, we can already know (I claim, but am not really expanding on here), that consequentialism itself is extremely powerful, natural, optimal, etc. and there are some very general and deep lessons that we can learn from this. “There might be a way to build an AI without all that” or even “In practice that won’t happen by default given current training methods, at least for a while” could be true, but it wouldn’t change my position.
OK, sure.
Right, this is closer to where I disagree. I think there is a strong argument about this that doesn’t have anything to do with “shaped cognition” or even AI in particular.
I would flag this as exactly the wrong kind of lesson / example to learn something interesting about consequentialism—failure and mediocrity are overdetermined; it’s just not that interesting that there are particular contrived examples where some humans fail at applying consequentialism. Some of the best places to look for the deeper lessons and intuitions about consequentialism are environments where there is a lot of cut-throat competition, possibility for outlier success and failure, not artificially constrained or bounded in time or resources, etc.