3. At some point, some set of AI agents will be such that:
they will all be able to coordinate with each other to try to kill all humans and take over the world; and
if they choose to do this, their takeover attempt will succeed.[13]
There are way too many assumptions about what “AI” is baked into this. Suppose you went back 50 years and told people “in the year 2024, everyone will have an AI agent built into their phone that they rely on for critical-to-life tasks they do (such as finding directions to the grocery store).”
The 1950′s observer would probably say something like “that sounds like a dangerous AI system that could easily take control of the world”. But in fact, no one worries about Siri “coordinating” to suddenly give us all wrong directions to the grocery store, because that’s not remotely how assistants work.
Trying to reason about what future AI agents will look like is basically equally fraught.
Second: for any failure you don’t want to ever happen, you always need to avoid that failure on the first try (and the second, the third, etc).
I think this is the crux of my concern. Obviously if AI kills us all, there will be some moment when that was inevitable, but merely stating that fact doesn’t add any additional information. I think any attempt to predict what AI agents will do from “pure reasoning” as opposed to careful empirical study of the capabilities of existing AI models is basically doomed to failure.
>Like, we can make reasonable prediction of climate in 2100, even if we can’t predict weather two month ahead.
This is a strange claim to make in a thread about AGI destroying the world. Obviously if AGI destroys the world we can not predict the weather in 2100.
Predicting the weather in 2100 requires you to make a number of detailed claims about the years between now and 2100 (for example, the carbon-emissions per year), and it is precisely the lack of these claims that @Matthew Barnett is talking about.