Oh yeah, I also find that annoying.
Josh Snider
> If we were to put a number on how likely extinction is in the absence of an aggressive near-term policy response, MIRI’s research leadership would give one upward of 90%.
This is what I interpreted as implying p(doom) > 90%, but it’s clearly a misreading to assume that someone advocating for “an aggressive near-term policy response” believes that it has a ~0% chance of happening.
I am in camp 2, but will try to refine my argument more before writing it down.
This is great. I recognize that this is almost certainly related to the book “If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All”, which I have preordered, but as a standalone piece I feel estimating p(doom) > 90 and dismissing alignment-by-default without an argument is too aggressive.
Yeah, I don’t think I read this when it came out, but I’m happy to read it now.
https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/foom-and-doom-1-brain-in-a-box-in-a-basement (and the sequel) seem highly related.
If you had a solution to alignment, building a Night-Watchman ASI would be decent, but that is a massive thing to assume. At the point where you could build this, it might be better to just build an ASI with the goal of maximizing flourishing.
This is one of the many old posts that someone should do a follow-up on.
Getting a lot of “Free Guy” vibes from this.
DeepMind supposedly also has gold, but the employee who said so deleted the tweet, so that’s not official yet.
It could be interesting to test emergent misalignment with a mixture-of-experts model. Could you misalign one of the experts, but not the others?
I definitely feel like if you trained a child to punch doctors, they would also kick cats and trample flowers.
Yes, it’s very good news to have such a wide range of people all coming together to support this.
I like the idea of making deals with AI, but trying to be clever and make a contract that would be legally enforceable under current law and current governments makes it too vulnerable to fast timelines. If a human party breached your proposed contract, AI takeover will likely happen before the courts can settle the dispute.
An alternative that might be more credible to the AI is to make the deal directly with it, but explicitly leave arbitrating and enforcing contract disputes to a future (hopefully aligned) ASI. This would ground the commitment in a power structure the AI might find more relevant and trustworthy than a human legal system that could soon be obsolete.
If alignment-by-default works for AGI, then we will have thousands of AGIs providing examples of aligned intelligence. This new, massive dataset of aligned behavior could then be used to train even more capable and robustly aligned models each of which would then add to the training data until we have data for aligned superintelligence.
If alignment-by-default doesn’t work for AGI, then we will probably die before ASI.
one reason it works with humans is that we have skin in the game
Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.
Have you taken a look at how companies manage Claude Code, Cursor, etc? That seems related.
It’s an open question, but we’ll find out soon enough. Thanks.
Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.
For one, I’m not optimistic about the AI 2027 “superhuman coder” being unable to betray us, but also this isn’t something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it’s a promising approach, but we can’t do it now and if it doesn’t get quick results, we won’t have time to get slow results.
This is a great story and the animation is also great. Good work everyone!