A thing I can’t quite tell if you’re incorporating into your model – the thing the book is about is:
“AI that is either more capable than the rest of humanity combined, or is capable of recursively self-improving and situationally aware enough to maneuever itself into having the resources to do so (and then being more capable than the rest of humanity combined), and which hasn’t been designed in a fairly different way from the way current AIs are created.”
I’m not sure if you’re more like “if that happened, I don’t see why it’d be particularly likely to behave like an ideal agent ruthlessly optimizing for alien goals”, or if you’re more like “I don’t really buy that this can/will happen in the first place.”
(the book is specifically about that type of AI, and has separate arguments for “someday someone will make that” and “when they do, here’s how we think it’ll go”)
No. The argument is “the current paradigm will produce the Bad Thing by default, if it continues on what looks like it’s default trajectory.” (i.e. via training, in a fashion where it’s not super predictable in advance what behaviors the training will result in in various off-distribution scenarios)
OK, cool. For some reason the sentence read weirdly to me so I wanted to clarify before replying (because if it was the case that the book was premised on a sudden paradigm shift in AI development that I didn’t think would occur by default, then that would indeed be an important step in the argument that I don’t address at all).
To answer your question directly, I think I disagree with the authors both on how capable AIs will become in the short-to-medium term (let’s say over the next century, to be concrete), and on the extent to which very capable AIs will be well-modeled as ideal agents optimizing for alien-to-us goals. As mentioned, I’m not saying it’s necessarily impossible, just very far from an inevitability. My view is based on (1) predictions about how SGD behaves and how AI training approaches will evolve (within the current paradigm) (2) physical constraints on data and compute (3) pricing in prosaic safety measures that labs are already taking and will (hopefully!) continue to take.
Because I don’t predict fast take-off I also think that if things are turning out to look worse than I expect, we’ll see warnings.
FYI I don’t think the book is making a particular claim that any of this will happen soon, merely that when it happens, the outcome will be very likely to be human extinction. The point is not that it’ll happen at a particular time/way – the LLM/ML paradigm might hit a wall, there might need to be algorithmic advances, it might instead route through narrow AI getting really good at conducting and leveraging neuroscience and making neuromorphic AI or whatever.
But, the fact that we know human brains run on a relatively low amount of power and training data means we should expect this to happen sooner or later. (but meanwhile, it does sure seem like both the current paradigm keeps advancing, and a lot of money is being poured in, so it seems at least reasonably likely the that it’ll be sooner rather than later).
The book doesn’t argue a particular timeline for that, but, it personally (to me) seems weird to me to expect it to take another century, in particular when you can leverage narrower pseudogeneral AI to help you make advances. And I have a hard time imagining takeoff taking longer than than a decade, or really even a couple years, once you hit full generality.
Nod. Does anything in the “AI-accelerated AI R&D” space feel cruxy for you? Or “a given model seems to be semi-reliably be producing Actually Competent Work in multiple scientific fields?”
A thing I can’t quite tell if you’re incorporating into your model – the thing the book is about is:
“AI that is either more capable than the rest of humanity combined, or is capable of recursively self-improving and situationally aware enough to maneuever itself into having the resources to do so (and then being more capable than the rest of humanity combined), and which hasn’t been designed in a fairly different way from the way current AIs are created.”
I’m not sure if you’re more like “if that happened, I don’t see why it’d be particularly likely to behave like an ideal agent ruthlessly optimizing for alien goals”, or if you’re more like “I don’t really buy that this can/will happen in the first place.”
(the book is specifically about that type of AI, and has separate arguments for “someday someone will make that” and “when they do, here’s how we think it’ll go”)
Is this double negative intended? Do you mean has been designed in a fairly different way?
No. The argument is “the current paradigm will produce the Bad Thing by default, if it continues on what looks like it’s default trajectory.” (i.e. via training, in a fashion where it’s not super predictable in advance what behaviors the training will result in in various off-distribution scenarios)
OK, cool. For some reason the sentence read weirdly to me so I wanted to clarify before replying (because if it was the case that the book was premised on a sudden paradigm shift in AI development that I didn’t think would occur by default, then that would indeed be an important step in the argument that I don’t address at all).
To answer your question directly, I think I disagree with the authors both on how capable AIs will become in the short-to-medium term (let’s say over the next century, to be concrete), and on the extent to which very capable AIs will be well-modeled as ideal agents optimizing for alien-to-us goals. As mentioned, I’m not saying it’s necessarily impossible, just very far from an inevitability. My view is based on (1) predictions about how SGD behaves and how AI training approaches will evolve (within the current paradigm) (2) physical constraints on data and compute (3) pricing in prosaic safety measures that labs are already taking and will (hopefully!) continue to take.
Because I don’t predict fast take-off I also think that if things are turning out to look worse than I expect, we’ll see warnings.
Nod.
FYI I don’t think the book is making a particular claim that any of this will happen soon, merely that when it happens, the outcome will be very likely to be human extinction. The point is not that it’ll happen at a particular time/way – the LLM/ML paradigm might hit a wall, there might need to be algorithmic advances, it might instead route through narrow AI getting really good at conducting and leveraging neuroscience and making neuromorphic AI or whatever.
But, the fact that we know human brains run on a relatively low amount of power and training data means we should expect this to happen sooner or later. (but meanwhile, it does sure seem like both the current paradigm keeps advancing, and a lot of money is being poured in, so it seems at least reasonably likely the that it’ll be sooner rather than later).
The book doesn’t argue a particular timeline for that, but, it personally (to me) seems weird to me to expect it to take another century, in particular when you can leverage narrower pseudogeneral AI to help you make advances. And I have a hard time imagining takeoff taking longer than than a decade, or really even a couple years, once you hit full generality.
Well, I hope we can have tea with our great-grandchildren in 100 years and discuss which predictions panned out!
Curious if there are any bets you’d make where, if they happened in the next 10 years or so, you’d significantly re-evaluate your models here?
I’d have to think about this more to have concrete bet ideas. Though feel free to suggest some.
Maybe one example is that I think the likelihood of >10% yearly GDP growth in any year in the next 10 years is <10%.
Nod. Does anything in the “AI-accelerated AI R&D” space feel cruxy for you? Or “a given model seems to be semi-reliably be producing Actually Competent Work in multiple scientific fields?”