Rob Bensinger comments on On presenting the case for AI risk

Rob Bensinger 13 Mar 2022 4:44 UTC
26 points
I’ll do my best to explain his position as I see it.
I think that’s a great exercise to start with, so it’s easier to spot if you’re passing his ITT / if there are points of miscommunication.
My understanding of his worldview is, we won’t have anything resembling true intelligence (other than parlor tricks) until we do, and on that day we’ll have an AGI which rapidly self-improves until it is superhuman.
“Parlor tricks” sounds a bit like a straw-man (is GPT-3 a parlor trick?), and Eliezer doesn’t think recursive self-improvement is a necessary part of the story. E.g., the first AGIs may be superhuman just because humans suck at science, math, etc.; it’s not as though calculators needed recursive self-improvement in order to vastly surpass humanity’s arithmetic abilities, or AlphaZero needed recursive self-improvement in order to vastly surpass humanity’s Go abilities.
The way I would put it is that, on Eliezer’s model, calculators, theorem-provers, AlphaGo, AGI, etc. are doing different kinds of cognitive work. AlphaGo may be very impressive, but it’s not performing the task ‘modeling the physical universe in all its (decision-relevant) complexity’, any more than it’s performing the ‘theorem-proving’ task.
Nor is AlphaGo a scaled-down version of an AGI, doing the same things you do to navigate messy physical environments, just in a simpler toy setting. Rather, Go just isn’t similar enough to the physical world to capture all of the kinds of cognitive problem you have to solve in order to succeed in the world at large.
This doesn’t mean that AI progress like AlphaGo is necessarily irrelevant to AGI; there is an enormous range of possibilities in between the extremes of ‘zero relevance to AGI’ and ‘exactly like AGI but scaled-down’. But if there are different kinds of AI doing different kinds of work, and the relevant kinds of work for ‘modeling and steering the real world’ hasn’t been invented yet, then my reply to Adele is relevant: invention is a 0-to-1 event, and AGI is the sort of thing that can be invented at a particular time and place, rather than just being ‘AlphaGo+’ or ‘theorem-prover+’. (Or, to name an example that’s more popular on LW, ‘GPT-3+’.)
Then, because intelligence is really powerful, it will immediately follow whatever goal function it was given.
I don’t understand this part. What does being powerful have to do with following your goal function?
Since we don’t have any time to experiment and find goal functions that produce reasonable, scope-bound behavior, the AI applies the goal function in a global way, takes over the world, and turns us all into paperclips.
This sounds sort of correct, except that in the context of ML we don’t know how to robustly instill real-world goals into AGI systems at all. E.g., we don’t know how to provide a training signal that will robustly cause an AGI to maximize the amount of diamond in the universe—a very simple goal that doesn’t require us to figure out corrigibility, low-impact behavior, ‘reasonableness’, etc.
It seems to me that everything is a step function in EY’s thinking.
That seems like a fine description, if we’re contrasting it with e.g. Paul Christiano’s view. In reality, both sides think the universe is a mix of continuous and discontinuous; but yes, it’s tempting for one side to say ‘you see everything as continuous!’ and for the other to say ‘you see everything as discontinuous!’.
In my view, everything in real life is continuous, and I think various attempts in this community to find historical discontinuities has demonstrated that discontinuities just don’t happen.
Isn’t this immediately falsified by human beings? … And isn’t it a bit concerning if your alleged generalization breaks down hardest on the most relevant data point we have for trying to predict the impact of automating general intelligence?
Like, ‘discontinuities just don’t happen’ is just obviously false, and I’m not sure what sort of figurative interpretation you want applied here. But whatever the interpretation, it seems very strange to me to deny that something like a human being could ever exist, on the grounds that e.g. people tended to mostly build smaller boats before they built bigger boats.
If humans sometimes built way bigger boats before building intermediate-sized boats, why would that provide Bayesian evidence about how problem-solving ability scales with compute? Or about the cognitive similarity between GPT-3 and AGI, or about… anything AGI-related at all? Why does the universe care about boat sizes, in deciding whether or not to ‘allow’ future inventions to ever be as high-impact as a human, a nuke, etc.? If that’s what you’re basing your confidence on, it just seems like a bizarre argument to me.
It also seems to me that EY believes that intelligence is all powerful.
Calculators aren’t “all-powerful” at arithmetic, and AlphaZero isn’t “all-powerful” at Go. It’s a very parochial model of intelligence that thinks you have to be “all-powerful” in order to spectacularly overshoot what’s needed to blow humans out of the water.
- dxu 13 Mar 2022 5:41 UTC
  13 points
  Parent
  Strong upvote; agree with most / all of what you wrote. Having said that:
  Isn’t this immediately falsified by human beings? … And isn’t it a bit concerning if your alleged generalization breaks down hardest on the most relevant data point we have for trying to predict the impact of automating general intelligence?
  I’m not sure how Conor would reply to this, but my models of Paul Christiano and Robin Hanson have some things to say in response. My Paul model says:
  Humans were preceded on the evolutionary tree by a number of ancestors, each of which was only slightly worse along the relevant dimensions. It’s true that humans crossed something like a supercriticality threshold, which is why they managed to take over the world while e.g. the Neanderthals did not, but the underlying progress curve humans emerged from was in fact highly continuous with humanity’s evolutionary predecessors. Thus, humans do not represent a discontinuity in the relevant sense.
  To this my Robin model adds:
  In fact, even calling it a “supercriticality threshold” connotes too much; the actual thing that enabled humans to succeed where their ancestors did not, was not their improved (individual) intelligence relative to said ancestors, but their ability to transmit discoveries from one generation to the next. This ability, “cultural evolution”, permits faster iteration on successful strategies than does the mutation-and-selection procedure employed by natural selection, and thus explains the success of early humans—but it does not permit for a new-and-improved AGI to come along and obsolete humans in the blink of an eye.
  Of course, I have to give my Eliezer model (who I agree with more than either of the above) a chance to reply:
  Paul: It’s all well and good to look back in hindsight and note that some seemingly discontinuous outcome emerged from a continuous underlying process, but this does not weaken the point—if anything, it strengthens it. The fact that a small, continuous change to underlying genetic parameters resulted in a massive increase in fitness shows that the function mapping design space to outcome space is extremely jumpy, which means that roughly continuous progress in design space does not imply a similarly continuous rate of change in real-world impact; and the latter is what matters for AGI.
  Robin: From an empirical standpoint, AlphaGo Zero is already quite a strong mark against the “cultural evolution” hypothesis. But from a more theoretical standpoint, note that (according to your own explanation!) the reason “cultural evolution” outcompetes natural selection is because the former iterates more quickly than the latter; this means that it is speed of iteration that is the real underlying driver of progress. Then, if there exists a process that permits yet faster iteration, it stands to reason that that process would outcompete “cultural evolution” in precisely the same way. Thinking about “cultural evolution” gives you no evidence either way as to whether such a faster process exists, which essentially means the “cultural evolution” hypothesis tells you nothing about whether / how quickly AGI can surpass the sum total of humanity’s ability / knowledge, after being created.
  - Rob Bensinger 13 Mar 2022 6:42 UTC
    4 points
    Parent
    Great comment; you said it better than I could.
    I do want to say:
    The existence of a supercriticality threshold at all already falsifies Connor’s ‘discontinuities can never happen’ model. Once the physical world allows discontinuities, you need to add some new assumption to the world that makes the AGI case avoid this physical feature of territory.
    And all of the options involve sticking your neck out to make at least some speculative claims about CS facts, the nature of intelligence, etc.; none of the options let you stop at boat-size comparisons. And if boat-size comparisons were your crux, it’s odd at best if you immediately discover a new theory of intelligence that lets you preserve your old conclusion about AI progress curves, the very moment your old reason for believing that goes away.
- Lone Pine 13 Mar 2022 8:08 UTC
  −1 points
  Parent
  Human beings entered into a world without intelligence, but machine intelligence will be entering a world where humans, corporations, governments and societies will be doing everything they can to control and monitor the AI. It took human beings hundreds of thousands of years to go from knowing nothing other than how to get food to knowing enough and being powerful enough to really start threatening e.g. the biosphere. AIs could go from pretty dumb to super intelligent very quickly, sure, but how long will it take to go from powerless to world domination? With humans doing our best to control AIs and resist?
  - Ben Pace 13 Mar 2022 17:07 UTC
    13 points
    Parent
    a world where humans, corporations, governments and societies will be doing everything they can to control and monitor the AI
    I think I have to claim this as wishful thinking. Were humans doing everything they could to control and monitor the coronavirus? No, and to say such a thing is to be telling fairy-tale stories, not describing the current human world.
    - Lone Pine 16 Mar 2022 0:54 UTC
      −1 points
      Parent
      I think you and I have a very different impression of the pandemic then. No pandemic in history was more closely monitored. We did more to try to control this pandemic than any health event ever. Also, humanity didn’t literally go extinct, as is claimed for alignment.
      - Ben Pace 16 Mar 2022 1:37 UTC
        2 points
        Parent
        Here is a quick drawing I did to communicate my point, and in particular to show the chasm between “best pandemic-response so far” and “doing everything we can”.
        I believe with quite high confidence that with a bit of test&trace and some challenge trials, this pandemic could’ve been over in 3-5 months, instead of over 2 years. Every part of this seems simple to me (the mRNA vaccines were invented in <48 hours in January, challenge trials require only like 100s of people for ~2 weeks to get confident results, I think a company with the logistical competence of Amazon could’ve gotten a country vaccinated in just a couple of months, etc). So it looks to me like we’re very very far from “Doing Everything We Can”, so even if as you say we did better than ever before, we still didn’t get a passing grade according to me.
        Recap: I’m making this point because you said you’re expecting a world where we’re doing everything we can, I gave the counterexample of Covid for our collective competence, and you said you thought we did better than any other pandemic. This isn’t a crux for me because there were lots of major easy wins we could’ve had which we did not. Our response looks to me like “not getting a passing grade” with the resources we had, and not really using most of the resources we had, while scoring lots of own-goals in the process.