If you assign a different meaning to the word, then you’re talking about a different thing, and the point changes accordingly.
No77e
Alignment seems quite similar to the problem of imbuing AIs with artistic taste. Morality and taste are both hard to verify and subjective (or inter-subjective). Alignment has in practice the further difficulty that deception may play a role. I.e., even after managing to train moral principles into an AI system, you have to make sure they actually act as a guide for action.
That said, my very subjective impression is that AI is far ahead in terms of ethical taste compared to artistic taste. Perhaps this is thanks to the fact that alignment has been considered a core AI problem for a much longer time.
I saw a snippet of an interview with Greg Brockman in which he said something like “I think of Spud as a new pre-train”. That “I think of...” made kind of suspicious. It’s either that Brockman phrased things badly or that Spud isn’t actually a new pre-train and that’s why Brockman used that phrasing. If it’s not actually a new pre-train, then my guess was that they did a massive amount of RLVR. If the amount of RLVR is at pre-training scale, then it would justify Brockman’s phrasing.
Honestly, I wasn’t picturing their reasons. Say more?
If we’re in a history simulation, I don’t think it’s unlikely that the simulators will just set us free in their reality. I’m expecting a more enlightened humanity to consider simulated humans moral patients.
There’s no real difference between a simulated human and a human. Both are causally-interacting signals in a computer. If you inhabit a simulation, you automatically inhabit base reality just as much as native base-reality beings. They’re also just interacting signals. It’s just that you’re embedded in some other software and they’re not, but that’s not fundamental at all.
Indeed, the scariest reaction known so far has been a comments section on Instagram (click only if you must), a place as distinct from AI and AI safety spaces of all kinds as one can get.
The comment section on r/technology under a post about the attack is also disquieting. Here, if you want to see.
AI poses unacceptable risks to all of us. This is simply a fact
Hmm this seems false to me. “Unacceptable” is subjective, not really a fact about the situation that sits outside of human minds.
Additionally, please consider: dangerous ideologies also have core beliefs they consider “facts”. Not being a dangerous ideology is downstream of actually not being dangerous, not from believing that what you believe is true.I don’t think AI pausers are in a “dangerous ideology” at the moment, but you’re defending your case badly, and being too glib. This should be a warning shot for you! You can’t just make up bad arguments about why everything is fine and call it a day.
Yeah I would call that “gaslighting”. It looks like my initial interpretation of what you meant by it is closer than Zack’s. I think Scott isn’t doing that. I’m inclined to believe you when you say other people have behaved this way.
I really don’t think Scott is gaslighting you. I think Scott is being honest here, but you should model him as having somewhat snapped. Pause AI and MIRI-adjacent people on X have been extremely adversarial and have been contributing to very bad discourse (even arguments-wise). I think Scott saw Rob’s post as very strawmannish and needlessly adversarial, and he more or less correctly lumped it in with this rising tide of terribleness, even if MIRI itself is definitely not as guilty. I might well be wrong about the specifics, but Scott Alexander isn’t the kind of person who tends to gaslight.
Scott Alexander left an important reply to Rob Bensinger on X. I happen to agree with Scott. Here’s the original post by Rob:
In response to “What did EAs do re AI risk that is bad?”:
Aside from the obvious ‘being a major early funder and a major early talent source for two of the leading AI companies burning the commons’, I think EAs en masse have tended to bring a toxic combination of heuristics/leanings/memes into the AI risk space. I’m especially thinking of some combination of:
‘be extremely strategic and game-playing about how you spin the things you say, rather than just straightforwardly reporting on your impressions of things’
plus ‘opportunistically use Modest Epistemology to dismiss unpalatable views and strategies, and to try to win PR battles’.
Normally, I’m at least a little skeptical of the counterfactual impact of people who have worsened the AI race, because if they hadn’t done it, someone else might have done it in their place. But this is a bit harder to justify with EAs, because EAs legitimately have a pretty unusual combination of traits and views.
Dario and a cluster of Open-Phil-ish people seem to have a very strange and perverse set of views (at least insofar as their public statements to date represent their actual view of the situation):
---
1. AI is going to become vastly superhuman in the near future; but being a good scientist means refusing to speculate about the potential novel risks this may pose. Instead, we should only expect risks that we can clearly see today, and that seem difficult to address today.
If there is some argument for why a problem P might only show up at a higher capability level, or some argument for why a solution S that works well today will likely stop working in the future… well, those are just arguments. Arguments have a terrible track record in AI; the field is full of surprises. So we should stick to only worrying about things when the data mandates it. This is especially important to do insofar as it will help us look more credible and thereby increase our political power and influence.
2. When it comes to technical solutions to AI, the burden of proof is on the skeptic: in the absence of proof that alignment is intractable, we should behave as though we’ve got everything under control. At the same time, when it comes to international coordination on AI, we will treat the burden of proof as being on the non-skeptic. Absent proof that governments can coordinate on AI, we should assume that they can’t coordinate. And since they can’t coordinate, there’s no harm in us doing a lot of things to make coordination even harder, to make our lives a bit more convenient as we work on the technical problems.
3. In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we’re well-positioned to act if and when an important opportunity arises.
If you’re claiming that now is an important opportunity, and that we should be speaking out loudly about this issue today… well, that sounds risky and downright immodest. Many things are possible, and the future is hard to predict! Taking political risks means sacrificing enormous option value. The humble and safe thing to do is to generally not make too much of a fuss, and just make sure we’re powerful later in case the need arises.
---
1-3 really does seem like an unusually toxic set of heuristics to propagate, potentially worse than replacement.
- In an engineering context, the normal mindset is to place the burden of proof on the engineer to establish safety. There’s no mature engineering discipline that accepts “you can’t prove this is going to kill a ton of people” as a valid argument.
The standard engineering mindset sounds almost more virtue-ethics-y or deontological rather than EA-ish—less “ehh it’s totally fine for me to put billions of lives at risk as long as my back-of-the-envelope cost-benefit analysis says the benefits are even greater!”, more “I have a sacred responsibility and duty to not build things that will bring others to harm.”
Certainly the casualness about p(doom) and about gambling with billions of people’s lives is something that has no counterpart in any normal scientific discipline.
- Likewise, I suspect that the typical scientist or academic that would have replaced EAs / Open Phil would have been at least somewhat more inclined to just state their actual concerns about AI, and somewhat less inclined to dissemble and play political games.
Scientists are often bad at such games, they often know they’re bad at such games, and they often don’t like those games. EAs’ fusion of “we’re playing the role of a wonkish Expert community” with “we’re 100% into playing political games” is plausibly a fair bit worse than the normal situation with experts.
- And EAs’ attempts to play eleven-dimensional chess with the Overton window are plausibly worse than how scientists, the general public, and policymakers normally react to any technology under the sun that sounds remotely scary or concerning or creepy: “Ban it!”
Governments are incredibly trigger-happy about banning things. There’s a long history of governments successfully coordinating to ban things dramatically less dangerous than superintelligent AI. And in fact, when my colleagues and I have gone out and talked to most populations about AI risk, people mostly have much more sensible and natural responses than EAs to this issue.
A way of summarizing the issue, I think, is that society depends on people blurting out their views pretty regularly, or on people having pretty simple and understandable agendas (e.g., “I want to make money” or “I want the Democrats to win”).
Society’s ability to do sense-making is eroded when a large fraction of the “specialists” talking about an issue are visibly dissembling and stretching the truth on the basis of agendas that are legitimately complicated and hard to understand.
Better would be to either exit the conversation, or contribute your actual pretty-full object-level thoughts to the conversation. Your sense of what’s in the Overton window, and what people will listen to, has failed you a thousand times over in recent years. Stop pretending at mastery of these tricky social issues, and instead do your duty as an expert and inform people about what’s happening.The reply by Scott Alexander:
I disagree with all of this on the epistemic level of “it’s not true”, and additionally disagree with your comms strategy of undermining EAs.
On the epistemic level—I haven’t seen EAs (other than SBF) do a lot of lying, equivocating, or even being particularly shy about their beliefs. I don’t know exactly who you’re talking about, but Holden made a personal blog post saying that his p(doom) was 50%, and said:
>>> “”I constantly tell people, I think this is a terrifying situation. If everyone thought the way I do, we would probably just pause AI development and start in a regime where you have to make a really strong safety case before you move forward with it.”
Dario said there’s a 25% chance “things go really, really badly”, and in terms of a pause:
>>> “I wish we had 5 to 10 years [before AGI]. The reason we can’t [slow down and] do that is because we have geopolitical adversaries building the same technology at a similar pace. It’s very hard to have an enforceable agreement where they slow down and we slow down. [But] if we can just not sell the chips to China, then this isn’t a question of competition between the U.S. and China. This is a question between me and Demis—which I am very confident we can work out.”
This is basically my position—I would add “we should try to negotiate with China, but keep this as a backup plan if it fails”, but my guess is Dario would also add this and just isn’t optimistic. I agree he’s written some other things (especially in Adolescence of Technology) that sound weirdly schizophrenic, and more on this later, but I give him a lot of credit for paragraphs like:
>>> “I think it would be absurd to shrug and say, “Nothing to worry about here!” But, faced with rapid AI progress, that seems to be the view of many US policymakers, some of whom deny the existence of any AI risks, when they are not distracted entirely by the usual tired old hot-button issues. Humanity needs to wake up, and this essay is an attempt—a possibly futile one, but it’s worth trying—to jolt people awake.”
Meanwhile, you seem to be treating all these people as basically equivalent to Gary Marcus. I think if you don’t mean these people in particular, you should specify who you’re talking about, and what things that they’ve said strike you in this way.
Absent that, I think this “debate” isn’t about OpenPhil or Anthropic failing to say they’re extremely worried, failing to say that catastrophe is a very plausible outcome, or failing to say that they think slowing down AI would be good if possible. It’s about OpenPhil in particular being pretty careful how they phrase things for public consumption. And I think any attempt to attack them for this should start with an acknowledgement that MIRI is directly responsible for all of our current problems by doing things like introducing DeepMind to its funders, getting Sam Altman and Elon Musk into AI, and building up excitement around “superintelligence” in Silicon Valley. I think if 2010-MIRI had slightly more strategicness and willingness to ask itself “hey, is this PR strategy likely to backfire?”, you might not have told a bunch of the worst people in the world that AI was going to be super-powerful and that whoever invested in it would be ahead in a race that might make them hundreds of billions of dollars (and yes, you did add “and then destroy the world”—but if you had been more strategic, you might have considered that investors wouldn’t hear that last part as loudly).
(you could argue that you’re not against strategicness in general, just talking about this one issue of saying cleanly that AI is very dangerous. But my impression is that Holden, Dario, have said this, many times—see examples above. What they haven’t said is “the situation is totally hopeless and every strategy except pausing has literally no chance of working”, but that isn’t a comms problem, that’s because they genuinely believe something different from you. And also, I frequently encountering people who say things like “Scott, I’m glad you wrote about X in way Y—it made me take AI risk seriously, after I’d previously been turned off of it by encountering MIRI”. I think a substantial reason that Dario’s writing sometimes seems schizophrenic when talking about AI risks is that he’s trying to convey that they’re serious while also trying to signal “I swear I’m not one of those MIRI people” so that his writing can reach some of the people you’ve driven away. I don’t think you drive them away because you’re “honest”, I think it’s just about normal issues around framing and theory-of-mind for your audience.)
I don’t actually want to re-open the “MIRI helped start DeepMind and OpenAI!!!” war or the “MIRI is arrogant and alienating!!! war—we’ve both been through both of these a million times—but I increasingly feel like a chump trying to cooperate while you’re defecting. This is the foundation of my comms worry. Your claim that “governments are incredibly trigger-happy about banning things...there’s a long history of governments successfully coordinating to ban things dramatically less dangerous than superintelligent AI” is too glib—I don’t think there’s ever been a ban on building something as economically-valuable and far-along as AI, executed competently enough that it would work if applied cookie-cutter to the AI situation. You’re trying to do a really difficult thing here. I respect this—all of our options are bad and unlikely to work, the situation is desperate, and I have no plan better than playing a portfolio of all the different desperate hard strategies in the hopes that one of them works. But my impression is that the rest of the field is executing this portfolio plan admirably, but MIRI and a few other PauseAI people are trying to sabotage every other strategy in the portfolio in the hope of forcing people into theirs.
(I think if you guys had your way, Anthropic would never have been founded, no safety-minded people would ever have joined labs, and the current world would be a race between XAI, Meta, and OpenAI, all of which would have a Yann LeCun style approach to safety, and none of which would have alignment teams beyond the don’t-say-bad-words level. We wouldn’t have the head of the leading AI lab writing letters to policymakers begging them to “jolt awake”, we wouldn’t have a substantial fraction of world compute going to Jan Leike’s alignment efforts, we wouldn’t have Ilya sitting on $50 billion for some super-secret alignment project—just Mark Zuckerberg stomping on a human face forever. In exchange, we would have won a couple more years of timeline, which would have been pointless, because timeline isn’t measured in distance from the year 1 AD, it’s measured in distance between some level of woken-up-ness and some point of danger, and the woken-up-ness would be pushed forward at the same rate the danger was.)
I support your fight-for-a-pause strategy in theory, and I would like to support it with praxis, but right now I feel very conflicted about this, because I worry that any support or oxygen you guys get will be spent knifing other safety advocates, while Sam Altman happily builds AGI regardless.
Rationalists and Pause AI people on X are accusing Davidad of suffering of AI psychosis. I think it’s them who have lost the plot actually, not Davidad. The move here looks political, rather than truth-tracking. “Davidad is now my political opponent, so I’m accusing him of being crazy.” This happened to Emmet Shear too at some point.
I also strongly believe AI psychosis to be a far more limited phenomenon than people here seem to believe. I think you’re treating it as a good soldier in your army of arguments rather than investigating it truthfully for what it is.
So far, we have documented cases of Generative AI being used to subvert elections in Romania (actually causing an annulment).
AFAIK that was not because of Gen AI, though the broader point of your comment does stand.
Previously, I said:
People are very worried about a future in which a lot of the Internet is AI-generated. I’m kinda not. So far, AIs are more truth-tracking and kinder than humans. I think the default (conditional on OK alignment) is that an Internet that includes a much higher population of AIs is a much better experience for humans than the current Internet, which is full of bullying and lies.
All such discussions hinge on AI being relatively aligned, though. Of course, an Internet full of misaligned AIs would be bad for humans, but the reason is human disempowerment, not any of the usual reasons people say such an Internet would be terrible.I feel good about this prediction so far. Instagram and TikTok have now a significant amount of AI-generated videos (though they haven’t overrun these platforms by any means). The categories I’ve seen so far are:
- Low-brow animated stories.
- Fantasy or sci-fi scenarios with music.
- Colorful AI-generated art.
- Cute meme animals.
The greatest sin of this content is that it’s often low quality. But it’s not really that great of a sin. I think, all things considered, AI slop is above average content. Other content often contains bullying, meanness, lies. AI-generated content rarely so.
Also, so far, this is mostly thanks to humans and to AI guardrails, not really due to the character of AIs as I expected in my initial quick take. It looks like humans are using this tech in mostly good-spirited ways so far.
Hmm but humans are not ruthless consequentialists, despite being consequentialist enough to be able to do all kinds of tasks and build civilization. So I don’t see how the Optimist’s argument is addressed.
We’re still in the part of AI 2027 that was easy to predict. They point this out themselves.
Sure but he hasn’t laid out the argument. “something something simulation acausal trade” isn’t a motivation.
I’d like to know what are your motivations for doing what you’re doing! In the first podcast you hinted at “weird reasons” but you didn’t say them explicitly in the end. I’m thinking about this quote:
Yeah, maybe a general question here is: I engage in recruiting sometimes and sometimes people are like, “So why should I work at Redwood Research, Buck?” And I’m like, “Well, I think it’s good for reducing AI takeover risk and perhaps making some other things go better.” And I feel a little weird about the fact that actually my motivation is in some sense a pretty weird other thing.
We love Claude, Claude is frankly a more responsible, ethical, wise agent than we are at this point, plus we have to worry that a human is secretly scheming whereas with Claude we are pretty sure it isn’t; therefore, we aren’t even trying to hide the fact that Claude is basically telling us all what to do and we are willingly obeying—in fact, we are proud of it.
My best guess is that this would be OK
My own felt sense, as an outsider, is that the pessimists look more ideological/political and fervent than the relatively normal-looking labs. According to the frame of the essay, the “catastrophe brought about with good intent” could easily be preventing AI progress from continuing and the political means to bring that about.
I keep seeing absolutely terrible epistemics from like 50% of AI Safety. From people who previously seemed reasonable. This quick take was prompted by an example I just saw, from Connor Leahy: https://x.com/JoshWalkos/status/2021087240126976511