https://twitter.com/DavidSKrueger
https://www.davidscottkrueger.com/
https://therealartificialintelligence.substack.com/p/the-real-ai-deploys-itself
David Scott Krueger (formerly: capybaralet)
Alignment vs. Safety, part 2: Alignment
“Alignment” and “Safety”, part one: What is “AI Safety”?
Reflections on the largest AI safety protest in US history
Ten different ways of thinking about Gradual Disempowerment
“Following the incentives”
Is AI a house of cards?
Systematically dismantle the AI compute supply chain.
Thanks for sharing your thoughts.
So your condition is “Severe or willful violation of our RSP, or misleading the public about it”.
My guess is that most people understood the RSP, or at least the part about not releasing dangerous systems, as a COMMITMENT in the sense of “we won’t do this” not a commitment in the sense of “we won’t do this… unless we publicly change our mind first”. I do think it’s hard to get good data on this, but I wonder if you disagree with my guess? It seems like there was at least substantial confusion around this point within the AI safety community (who I’d consider part of “the public”), confusion which mostly could’ve been easily remedied by Anthropic—the failure to do so seems like at least “letting a significant fraction of the public be misled”, which I think counts as “misleading the public”.
Unless, or course, the RSP ought to have been interpretted as a COMMITMENT all along, in which case, this update seems like a violation of an implicit “meta-commitment” to honor the COMMITMENT in perpituity.
If you agree with the thrust of my argument, it seems like you’d have to either 1) agree that your condition is met or 2) argue that it was clear to the public that the commitment was not a COMMITMENT, or 3) argue that there is no such implicit meta-commitment.
I’d appreciate if you would clarify where exactly our disagreement lies.
What happens if you merge the bash and the audit tool, just giving the AI a single bash tool from which it can
fragment?
For now, such evidence is not really relevant to takeover risk because models are weak and can’t execute on complex world domination plans, but I can imagine such arguments becoming more directly relevant in the future.
Maybe a nit RE phrasing, but the reasoning here doesn’t make sense. It’s relevant to takeover risk even if the model is known to be weak
Thanks for the pointers! I think there should probably be more, but I’m glad to know there’s more than I was aware of.
What do we know about AI company employee giving?
My new organization https://evitable.com/ is fund-raising. I’m a long-time AI safety researcher and AI professor and initiated the one-sentence Statement on AI Risk.
Evitable’s mission is to inform and organize the public to confront societal-scale risks of AI, and put an end to the reckless race to develop superintelligence.
Our vision is that in ten years time, people will look back at the current race to build superintelligence as unthinkably terrible and wrongheaded, similar to how people view things like slavery.
You can donate at https://www.every.org/evitable or https://manifund.org/projects/evitable-a-new-public-facing-ai-risk-nonprofit-a1ll15pvkcb.
Right, so the response would be “just don’t worry about getting re-elected and try to get some shit done in your term”.