I you believe “there’ll probably be warning shots”, that’s an argument against “someone will get to build It”, but not an argument against “if someone built It, everyone would die.” (where “it” specifically means “an AI smart enough to confidently outmaneuver all humanity, built by methods similar to today where they are ‘organically grown’ in hard to predict ways”).
It’s a bit of both.
Suppose there are no warning shots. A hypothetical AI that’s a a bit weaker than humanity but still awfully impressive doesn’t do anything at all that manifests an intent to harm us. That could mean:
The next, somewhat more capable of this AI will not have any intent to harm us because through either luck or design we’ve ended up with a non-threatening AI.
This version of the AI is biding its time to strike and is sufficiently good at deception that we miss that fact.
This AI is fine, but making it a little smarter/more capable will somehow lead to the emergence of malign intent.
I take Yudkowsky and Soares to put all the weight on #2 and #3 (with, based on their scenario, perhaps more of it on #2).
I don’t think that’s right. I think if we have reached the point where an AI really could plausibly start and win a war with us and it doesn’t do anything nasty, there’s a fairly good chance we’re in #1. We may not even really understand how we got into #1, but sometimes things just work out.
I’m not saying this is some kind of great strategy for dealing with the risk; the scenario I’m describing is one where there’s a real chance we all die and I don’t think you get a strong signal until you get into the range where the AI might win, which is a bad range. But it’s still very different than imagining the AI will inherently wait to strike until it has ironclad advantages.
Are these actually costly actions to any meaningful degree? In the context of the amount of money sloshing around the AI space, hiring even “lots” of safety researchers seems like a rounding error.
I may misunderstand the commitments you’re referring to, but I think these are all purely internal? And thus not really commitments at all.
This seems to presume that I have some well-formed views on how AI labs compare, and I don’t have those. All I really know about Meta is that they’re behind and doing open source. I wouldn’t even know where to start an analysis of their relative level of moral integrity. So far as it goes (and, again, this is just the view of someone that reads what breaks through in mainstream news coverage), I have a very clear sense that OpenAI is run by compulsive liars but not much more to go on beyond that other than a general sense that people in the industry do a lot of hype.
I’m deliberately not looking this up and telling you my impression of this phenomenon. I’m coming up with three cases of it (my recollection is maybe garbled) that broke though into my media universe:
My understanding is that Anthropic was formed by people who broke away from OpenAI based on “safety” concerns. But then they just founded another company doing the same thing? And they got very rich doing it. So that all has roughly zero credibility.
There was an engineer at one of the big tech companies (Google? Microsoft?) who got a lot of attention for claiming that AI had achieved sentience and deserved personhood and either quit or got fired. The universal take seemed to be that he was insane.
One of the people involved in AI 2027 had quit or gotten fired from OpenAI(?) and refused to sign an NDA that would have come with a big payday so that he could go public with criticism. That seems pretty sincere and credible so far as it goes, but it’s also one person. And then AI 2027 was so overwrought that I couldn’t take it seriously.
And then, beyond that, you seem to have a lot of people signing these open letters with no cost attached. For something like this to breakthrough, it needs to be (in my estimation at least) large numbers of people acting in a coordinated way and leaving the industry entirely.
I’d analogize it to politics. In any given presidential administration, you have one or two people who get really worked up and resign angrily and then go on TV attacking their former bosses. That’s just to be expected and doesn’t really reflect anything beyond the fact that sometimes people have strong reactions or particularized grievances or whatever. The thing that (should) wake you up is when this is happening at scale.
Only steps that carry meaningful financial consequences. I agree that any individual researcher can send a credible signal by quitting and giving up their stock, at least to the extent they don’t just immediately go into a similarly compensated position. But, you’re always left with the counter-signal from all the other researchers not doing that.
On a more institutional level, it would have to be something that actually threatens the valuation of the companies.