https://twitter.com/DavidSKrueger
https://www.davidscottkrueger.com/
https://therealartificialintelligence.substack.com/p/the-real-ai-deploys-itself
David Scott Krueger
Well, I did say “Naively”… but yes I agree the analysis was too naive, and I will edit the post. You make a good point that it can be improved by considering that harms from AI (especially large-scale ones like x-risk) are overdetermined when there are multiple developers. The naive analysis is more accurate when the risk is smaller.
As a side note, if the risk from a single project is so large, then the first project is probably disincentivized at the individual level (would you really want to take an 80% risk of extinction?), and it’s a “pure” coordination problem, like a stag hunt, rather than an incentive problem (like prisoner’s dilema).
Another way the “naive” calculation can be is wrong (which is the main one I had in mind) is if the risks of different projects are correlated, which they are, e.g. because they are all using similar technology.
I’m not going after particular people’s justifications for their work; I’m going after the institutionalization of “marginal risk” as a relevant concept and the way it justifies unacceptable risk-taking.
I think it’s helpful to disaggregate things sometimes, and e.g. look at what trends might underly this general trend we observe
greater wealth hasn’t changed the picture tremendously
I don’t think I made that claim anywhere in my piece.
I think we disagree on the likelihood of x-risk advocates having such a precise level of impact and power. If AI x-risk advocates don’t have much sway over the political demands of an anti-AI movement, then I don’t think we have much to worry about.
Your argument seems to treat “pause/stop AI” as almost like an info-hazard, but it’s really quite an obvious idea. If “pause/stop AI” becomes a core idea in an anti-AI movement where x-risk advocates lack power, I’d expect that happens because it’s memetically fit and other people were saying it as well, so it probably would’ve happened anyways.
In my mind, the world’s where there is political will for a pause are mostly the ones where there is a broad understanding that if we don’t stop building AI, it is going to replace humanity, and this motivates the need for an international pause. Similarly, I think a pause that happens without the USG having been AGI-pilled seems really unlikely, and it’s also very hard for me to imagine the current administration doing a unilateral pause.
Overall, I think if “pause/stop AI” succeeds at all, it will probably succeed substantively, the slice of worlds where this doesn’t happen seem very narrow, because it’s a big ask.
I don’t mean to put regulation and stopping in opposition. My point is that, stopping is likely a precondition for any form of regulation that would significantly slow down development or deployment. Like, you, I am trying to argue against framings that put
“we need a global treaty to stop AI risks” in opposition to “domestic regulation is the only realistic path.”
I think stopping unlocks a lot of ability for countries to regulate in line with their values and priorities that otherwise might not be possible because of race dynamics.
I’ve tried to edit my post to make that clearer, please let me know if you have any specific suggestions on that front.
Yeah, this is a good point. The way I’ve put it before is: when you are thinking about what should happen, you’re basically imagining you have some sort of magic wand that makes it happen. But how powerful is the magic wand? I haven’t thought this through to my satisfaction, so for now I’m just going based on intuitive notions of what is actually realistically achievable.
But one way of trying to define the limits of the “magic wand” here would be: You get to magically choose a policy to be adopted, but you don’t get to magically control people’s behavior afterwards. So if you want to get people to limit AI uses, your policy needs to deal with their potential incentives to do otherwise.
This means, IIUC, that the answer to your final question is “yes”. But it’s more a matter of perceived incentives here, IMO, see: https://therealartificialintelligence.substack.com/p/following-the-incentives
> If someone believes that it will be hard to make international agreements to stop AI because countries will have incentives against this, does that mean that those considerations now fall under “incentives” and thus count for purpose of determining whether stopping is “hard”?
There’s not a lot of demand for human cloning. See https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:start
Good point RE deskilling of alignment researchers.
Right, so the response would be “just don’t worry about getting re-elected and try to get some shit done in your term”.
Thanks for sharing your thoughts.
So your condition is “Severe or willful violation of our RSP, or misleading the public about it”.
My guess is that most people understood the RSP, or at least the part about not releasing dangerous systems, as a COMMITMENT in the sense of “we won’t do this” not a commitment in the sense of “we won’t do this… unless we publicly change our mind first”. I do think it’s hard to get good data on this, but I wonder if you disagree with my guess? It seems like there was at least substantial confusion around this point within the AI safety community (who I’d consider part of “the public”), confusion which mostly could’ve been easily remedied by Anthropic—the failure to do so seems like at least “letting a significant fraction of the public be misled”, which I think counts as “misleading the public”.
Unless, or course, the RSP ought to have been interpretted as a COMMITMENT all along, in which case, this update seems like a violation of an implicit “meta-commitment” to honor the COMMITMENT in perpituity.
If you agree with the thrust of my argument, it seems like you’d have to either 1) agree that your condition is met or 2) argue that it was clear to the public that the commitment was not a COMMITMENT, or 3) argue that there is no such implicit meta-commitment.
I’d appreciate if you would clarify where exactly our disagreement lies.
What happens if you merge the bash and the audit tool, just giving the AI a single bash tool from which it can
fragment?
For now, such evidence is not really relevant to takeover risk because models are weak and can’t execute on complex world domination plans, but I can imagine such arguments becoming more directly relevant in the future.
Maybe a nit RE phrasing, but the reasoning here doesn’t make sense. It’s relevant to takeover risk even if the model is known to be weak
Thanks for the pointers! I think there should probably be more, but I’m glad to know there’s more than I was aware of.
My new organization https://evitable.com/ is fund-raising. I’m a long-time AI safety researcher and AI professor and initiated the one-sentence Statement on AI Risk.
Evitable’s mission is to inform and organize the public to confront societal-scale risks of AI, and put an end to the reckless race to develop superintelligence.
Our vision is that in ten years time, people will look back at the current race to build superintelligence as unthinkably terrible and wrongheaded, similar to how people view things like slavery.
You can donate at https://www.every.org/evitable or https://manifund.org/projects/evitable-a-new-public-facing-ai-risk-nonprofit-a1ll15pvkcb.
In general I think you should be a little suspicious of all lab self-reports about data usage, partly because they have a strong incentive to slightly fudge the category boundaries. In this case, they had a top-level category for “self-expression” which included “relationships and personal reflection” as well as “games and role-play”. Make of that what you will. But overall I think this kind of work is extremely valuable, and I’m very glad they did it.
Another reason I heard is that they don’t include enterprise use here, e.g. because of privacy agreements with companies. This data may also look more “job replace-y” vs. “complementary”.
agreed—I’m suggesting they’ll be blending together, and that moving towards AI generated videos as the primary means of generating content on social media will help companies automate content creators
Huge thanks to all the lab employees who stated their support for an AI moratorium in this thread!
Can we make this louder and more public? This is really important for the public to understand.
why not?
In a word: InkHaven.
But seriously, I’m still working full-time on Evitable.com and so am trying to churn out my daily blog posts FAST. There are topics I know I have things to say about, and I try to get them down in words in ~1-2 hours tops. In this case, the motivation is something like: “It’s annoying when people make behaviorist arguments about how AIs are more aligned/trustworthy than people”.