I find it hard to trust that AI safety people really care about AI safety.
DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations:
OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn’t; too many more to list.
Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that’s not related to x-risk but it’s related to trustworthiness).
For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but:
Epoch is not about reducing x-risk, and they were explicit about this but I didn’t learn it until this week
its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad)
some of their researchers left to start another build-AGI startup (I’m not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities)
Director Jaime Sevilla believes “violent AI takeover” is not a serious concern, and “I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development”, and “on net I support faster development of AI, so we can benefit earlier from it” which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born)
I feel bad picking on Epoch/Jaime because they are being unusually forthcoming about their motivations, in a way that exposes them to criticism. This is noble of them; I expect most orgs not to be this noble.
When some other org does something that looks an awful lot like it’s accelerating capabilities, and they make some argument about how it’s good for safety, I can’t help but wonder if they secretly believe the same things as Epoch and are not being forthright about their motivations
My rough guess is for every transparent org like Epoch, there are 3+ orgs that are pretending to care about x-risk but actually don’t
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
Many AI safety people currently or formerly worked at AI companies. They stand to make money from accelerating AI capabilities. The same is true of grantmakers
I briefly looked thru some grantmakers and I see financial COIs at Open Philanthropy, Survival and Flourishing Fund, and Manifund; but none at Long-Term Future Fund
A different sort of conflict of interest: many AI safety researchers have an ML background and enjoy doing ML. Unsurprisingly, they often arrive at the belief that doing ML research is the best way to make AI safe. This ML research often involves making AIs better at stuff. Pausing AI development (or imposing significant restrictions) would mean they don’t get to do ML research anymore. If they oppose a pause/slowdown, is that for ethical reasons, or is it because it would interfere with their careers?
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation. I am appalled at the level of selfishness required to seek immortality at the cost of risking all of humanity. And I’m sure most people who hold this position know it’s appalling, so they keep it secret and publicly give rationalizations for why accelerating AI is actually the right thing to do. In a way, I admire people who are open about their selfishness, and I expect they are the minority.
But you should know that you might not be able to trust me either.
I have some moral uncertainty but my best guess is that future people are just as valuable as present-day people. You might think this leads me to put too much priority on reducing x-risk relative to helping currently-alive people. You might think it’s unethical that I’m willing to delay AGI (potentially hurting currently-alive people) to reduce x-risk.
I care a lot about non-human animals, and I believe it’s possible in principle to trade off human welfare against animal welfare. (Although if anything, I think that should make me care less about x-risk, not more.)
ETA: I am pretty pessimistic about AI companies’ plans for aligning ASI. My weakly held belief is that if companies follow their current plans, there’s a 2 in 3 chance of a catastrophic outcome. (My unconditional P(doom) is lower than that.) You might believe this makes me too pessimistic about certain kinds of strategies.
(edit: removed an inaccurate statement)
[1] ETA: I saw several examples of this on Twitter. Went back and looked and I couldn’t find the examples I recall seeing. IIRC they were mainly quote-tweets, not direct replies, and I don’t know how to find quote-tweets (the search function was unhelpful).
I think this is straightforwardly true and basically hard to dispute in any meaningful way. A lot of this is basically downstream of AI research being part of a massive market/profit generating endeavour (the broader tech industry), which straightforwardly optimises for more and more “capabilities” (of various kinds) in the name of revenue. Indeed, one could argue that long before the current wave of LLMs the tech industry was developing powerful agentic systems that actively worked to subvert human preferences in favour of disempowering them/manipulating them, all in the name of extracting revenue from intelligent work… we just called the AI system the Google/Facebook/Youtube/Twitter Algorithm.
The trend was always clear: an idealistic mission to make good use of global telecommunication/information networks finds initial success and is a good service. Eventually pressures to make profits cause the core service to be degraded in favour of revenue generation (usually ads). Eventually the company accrues enough shaping power to actively reshape the information network in its favour, and begins dragging everything down with it. In the face of this AI/LLMs are just another product to be used as a revenue engine in the digital economy.
AI safety, by its nature, resists the idea of creating powerful new information technologies to exploit mercilessly for revenue without care for downstream consequences. However, many actors in the AI safety movement are themselves tied to the digital economy, and depend on it for their power, status, and livelihoods. Thus, it is not that there are no genuine concerns being expressed, but that at every turn these concerns must be resolved in a way that keeps the massive tech machine going. Those who don’t agree with this approach are efficiently selected against. For example:
Race dynamics are bad? Maybe we should slow down. We just need to join the race and be the more morally-minded actor. After all, there’s no stopping the race, we’re already locked in.
Our competitors/other parties are doing dangerous things? Maybe we could coordinate and share our concerns and research with them. We can’t fall behind, we’ve got to fly the AI safety flag at the conference of AI Superpowers. Let’s speed up too.
New capabilities are unknown and jagged? Let’s just leave well enough alone. Let’s invest more in R&D so we can safely understand and harness them.
Here’s a new paradigm that might lead to a lot of risk and a lot of reward. We should practice the virtue of silence and buy the world time We should make lots of noise so we can get funding. To study it. Not to use it, of course. Just to understand the safety implications.
To be honest, though, I’m not sure what to do about this. So much has been invested by now that it truly feels like history is moving with a will of its own, rather than individuals steering the ship. Every time I look at what’s going on I feel the sense that maybe I’m just the idiot that hasn’t gotten the signal to hammer that “exploit” button. After all, it’s what everyone else is doing.
Our competitors/other parties are doing dangerous things? Maybe we could coordinate and share our concerns and research with them
What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
I think many of these are predicated on the belief that it would be plausible to get everyone to pause now. In my opinion this is extremely hard and pretty unlikely to happen. I think that, even in worlds where actors continue to race, there are actions we can take to lower the probability of x-risk, and it is a reasonable position to do so.
I separately think that many of the actions you describe historically were dumb/harmful, but are equally consistent with “25% of safety people act like this” and 100%
What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
Not GP but I’d guess maybe 10%. Seems worth it to try. IMO what they should do is hire a team of top negotiators to work full-time on making deals with other AI companies to coordinate and slow down the race.
ETA: What I’m really trying to say is I’m concerned Anthropic (or some other company) would put in a half-assed effort to cooperate and then give up, when what they should do is Try Harder. “Hire a team to work on it full time” is one idea for what Trying Harder might look like.
Fair. My probability is more like 1-2%. I do think that having a team of professional negotiators seems a reasonable suggestion though. I predict the Anthropic position would be that this is really hard to achieve in general, but that if slowing down was ever achieved we would need much stronger evidence of safety issues. In addition to all the commercial pressure, slowing down now could be considered to violate antitrust law. And it seems way harder to get all the other actors like Meta or DeepSeek or xAI on board, meaning I don’t even know if I think it’s good for some of the leading actors to unilaterally slow things down now (I predict mildly net good, but with massive uncertainty and downsides)
I think it’s important to distinguish between factual disagreements and moral disagreements. My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc. I’m much less sure the founders of Mechanize care.
And to whatever degree you trust my judgement/honesty, I work at DeepMind and reducing existential risk is a fairly large part of my motivation (though far from all of it), and I try to regularly think about how my team’s strategy can be better targeted towards this.
And I know a lot of safety people at deepmind and other AGI labs who I’m very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team’s current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is actually mattering for X-Risk, some dedicate a bunch of time to forming detailed mental models of how AI leads to bad outcomes and how this could be prevented and how their work fit in, etc. If people just wanted to do fun ml work, there’s a lot of other places Obviously people are complex. People are largely not motivated by a single thing and the various conflicts of interest you note seem real. I expect some of the people I’m thinking of. I’ve misjudged or say they care about X-Risk but actually don’t. But I would just be completely shocked if say half of them were not highly motivated by reducing x-. It’s generally more reasonable to be skeptical of the motivations of senior leadership, who have much messier incentives and constraints on their communication.
Regarding the missing mood thing, I think there’s something to what you’re saying, but also that it’s really psychologically unhealthy to work in a field which is constantly advancing like AI and every time there is an advance feel the true emotional magnitudes of what it would mean if existential risk has now slightly increased. If anyone did, I think they would burn out pretty fast, so the people left in the field largely don’t. I also think you should reserve those emotions for times when a trend is deviated from rather than when a trend is continued. In my opinion, the reason people were excited about the metr work was that it was measuring a thing that was already happening much more precisely, it was a really important question and reducing our confusion about that is high value. It wasn’t really capabilities work in my opinion
unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born
Is this implicitly assuming total utilitarianism? I certainly care about the future of humanity, but I reject moral views that say it is overwhelmingly the thing that matters and present day concerns round down to zero. I think many people have intuitions aligning with this.
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely
I don’t think this is true. My sense is he views his current work as largely being good on non x-risk grounds, and thinks that even if it might slightly increase x-risk, he wouldn’t think it would be worth it for him to stop working on it, since he thinks it’s unfair to force the current generation to accept a slightly higher risk of not achieving longevity escape velocity and more material wealth in exchange for a small increase in existential risk.
He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
I selfishly care about me, my friends and family benefitting from AI. For some of my older relatives, it might make a big difference to their health and wellbeing whether AI-fueled explosive growth happens in 10 vs 20 years.
[...]
I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term.
And I don’t think it’s fair of anyone to ask me to do that. Not that it should be my place to unilaterally make such a decision anyway.
It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
It is true that Jaime might think that AI x-risk could hypothetically be motivating to him, but at least my best interpretations of what is going on, suggest to me he de-facto does not consider it as an important input into his current strategic choices, or the choices of Epoch.
It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
Does not seem a fair description of
I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term
People are allowed to have multiple values! If someone would trade a small amount of value A for a large amount of value B, this is entirely consistent with them thinking both are important.
Like, if you offer people the option to commit suicide in exchange for reducing x-risk by x%, what value of x do you think they would require? And would you say they are not x risk motivated if they eg aren’t willing to do it at 1e-6?
In practice this doesn’t really come up, so it’s not that relevant. Similarly for Jaime’s position, how much he believes himself to be in situations where he’s trading off meaningful harm to today and meaningful harm to the present generation seems very important.
I did a bit of digging, because these quotes seemed narrow to me. Here’s the original tweet of that tweet thread.
Full state dump of my AI risk related beliefs:
- I currently think that we will see ~full automation of society by Median 2045, with already very significant benefits by 2030 - I am not very concerned about violent AI takeover. I am concerned about concentration of power and gradual disempowerment. I put the probability that ai ends up being net bad for humans at 15%. - I support treating ai as a general purpose tech and distributed development. I oppose stuff like export controls and treating AI like military tech. My sense is that AI goes better in worlds where we gradually adopt it and it’s seen as a beneficial general purpose tech, rather than a key strategic tech only controlled by a small group of people— I think alignment is unlikely to happen in a robust way, though companies could have a lot of sway on AI culture in the short term. - on net I support faster development of AI, so we can benefit earlier from it.
It’s a hard problem, and I respect people trying their hardest to make it go well.
Then right after:
All said, this specific chain doesn’t give us a huge amount of information. It totals something like 10-20 sentences.
> He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
This seems like a major oversimplification to me. He says “I am concerned about concentration of power and gradual disempowerment. I put the probability that ai ends up being net bad for humans at 15%.” There is a cluster in the rationalist/EA community that believes that “gradual disempowerment” is an x-risk. Perhaps you wouldn’t define “concentration of power and gradual disempowerment” as technically an x-risk, but if so, that seems a bit like a technicality to me. It can clearly be a very major deal.
It sounds a lot to me that Jaime is very concerned about some aspects of AI risk but not others.
In the quote you reference, he clearly says, “Not that it should be my place to unilaterally make such a decision anyway.”. I hear him saying, “I disagree with the x-risk community about the issue of slowing down AI, specifically. However, I don’t think this disagreement a big concern, given that I also feel like it’s not right for me to personally push for AI to be sped up, and thus I won’t do it.”
I am not saying Jaime in-principle could not be motivated by existential risk from AI, but I do think the evidence suggests to me strongly that concerns about existential risk from AI are not among the primary motivations for his work on Epoch (which is what I understood Neel to be saying).
Maybe it is because he sees the risk as irreducible, maybe it is because the only ways of improving things would cause collateral damage for other things he cares about. I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.
My sense is that Jaime’s view (and Epoch’s view more generally) is more like: “making people better informed about AI in a way that is useful to them seems heuristically good (given that AI is a big deal), it doesn’t seem that useful or important to have a very specific theory of change beyond this”. From this perspective, saying “concerns about existential risk from AI are not among the primary motivations” is partially slightly confused as the heuristic isn’t necessarily back chained from any more specific justification. Like there is no specific terminal motivation.
Like consider someone who donates to Give Directly due to “idk, seems heuristically good to empower the worst off people” and someone who generally funds global health and well being due to specifically caring about ongoing human welfare (putting aside AI for now). This heuristic is partially motived via flow through from caring about something like welfare even though it doesn’t directly show up. These people seem like natural allies to me except in surprising circumstances (e.g., it turns out the worst off people use marginal money/power in a way that is net negative for human welfare).
I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc.
“Not 100% motivated by it” IMO sounds like an implication that “being motivated by reducing x-risk would make up something like 30%-70% of the motivation”. I don’t think that’s true, and I think various things that Jaime has said make that relatively clear.
I think you’re conflating “does not think that slowing down AI obviously reduces x-risk” with “reducing x risk is not a meaningful motivation for his work”. Jaime has clearly said that he believes x risk is a real and >=15% (though via different mechanisms to loss of control). I think that the public being well informed about AI generally reduces risk, and I think that Epoch is doing good work on this front, and that increasing the probability that AI goes well is part of why Jaime works on this. I think it’s much less clear if Frontier Math was good, but Jaime wasn’t very involved anyway, so doesn’t seem super relevant.
I basically think the only thing he’s said that you could consider objectionable is that he’s reluctant to push for a substantial pause for AI since x risk is not the only thing he cares about. But he also (sincerely, imo) expresses uncertainty about whether such a pause WOULD be good for x risk
1. Do Jaime’s writings that that he cares about x-risk or not? → I think he fairly clearly states that cares.
2. Does all the evidence, when put together, imply that actually, Jaime doesn’t care about x-risk? → This is a much more speculative question. We have to assess how honest he is in his writing. I’d bet money that Jaime at least believes that he cares and is taking corresponding actions. This of course doesn’t absolve him of full responsibility—there are many people who believe they do things for good reasons, but causally actually do things for selfish reasons. But now we’re getting to a particularly speculative area.
“I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.” → Again, to me, I regard him as basically claiming that he does care. I’d bet money that if we ask him to clarify, he’d claim that he cares. (Happy to bet on this, if that would help)
At the same time, I doubt that this is your actual crux. I’d expect that even if he claimed (more precisely) to care, you’d still be skeptical of some aspect of this.
---
Personally, I have both positive and skeptical feelings about Epoch, as I do other evals orgs. I think they’re doing some good work, but I really wish they’d lean a lot more on [clearly useful for x-risk] work. If I had a lot of money to donate, I could picture donating some to Epoch, but only if I could get a lot of assurances on which projects it would go to.
But while I have reservations about the org, I think some of the specific attacks against them (and defenses or them) are not accurate.
People’s “deep down motivations” and “endorsed upon reflection values,” etc, are not the only determiners of what they end up doing in practice re influencing x-risk.
In that case I think your response is a non sequitur, since clearly “really care” in this context means “determiners of what they end up doing in practice re influencing x-risk”.
I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.
I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety
People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.
At the risk of seeming quite combative, when you say
And I know a lot of safety people at deepmind and other AGI labs who I’m very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team’s current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is actually mattering for X-Risk, some dedicate a bunch of time to forming detailed mental models of how AI leads to bad outcomes and how this could be prevented and how their work fit in, etc.
That’s basically what I mean when I said in my comment
AI safety, by its nature, resists the idea of creating powerful new information technologies to exploit mercilessly for revenue without care for downstream consequences. However, many actors in the AI safety movement are themselves tied to the digital economy, and depend on it for their power, status, and livelihoods. Thus, it is not that there are no genuine concerns being expressed, but that at every turn these concerns must be resolved in a way that keeps the massive tech machine going. Those who don’t agree with this approach are efficiently selected against. [examples follow]
And, after thinking about it, I don’t see your statement conflicting with mine.
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation.
If this is you: Please just sign up for cryonics. It’s a much better immortality gambit than rushing for ASI.
This seems not to be true assuming a P(doom) of 25% and a purely selfish perspective, or even a moderately altruistic perspective which places most of its weight on, say, the person’s immediate family and friends.
Of course any cryonics-free strategy is probably dominated by that same strategy plus cryonics for a personal bet at immortality, but when it comes to friends and family it’s not easy to convince people to sign up for cryonics! But immortality-maxxing for one’s friends and family almost definitely entails accelerating AI even at pretty high P(doom)
(And that’s without saying that this is very likely to not be the true reason for these people’s actions. It’s far more likely to be local-perceived-status-gradient-climbing followed by a post-hoc rationalization (which can also be understood as a form of local-perceived-status-gradient-climbing) and signing up for cryonics doesn’t really get you any status outside of the deepest depths of the rat-sphere, which people like this are obviously not in since they’re gaining status from accelerating AI)
The more sacrifices someone has made, the easier it is to believe that they mean what they say. Kokotajlo gave up millions to say what he wants, so I trust he is earnest. People who have gotten arrested at Stop AI have spent time in jail for their beliefs, so I trust they are earnest. It doesn’t mean these people are most useful for AI safety but on the subject of trust I know no better measurement than sacrifice.
Note that any competent capital holder has significant conflict of interest with AI, AI is already a significant fraction of the stock market and a pause would bring down most capital, not just private lab equity
Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
Otherwise yeah I’m on board with mood of your post.
Personally I really like doing math/philosophy and I have convinced myself that it is necessary to avert doom. At least I’m not accelerating progress much!
Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
Ah you’re right, I wasn’t thinking about that. (Well I don’t think it’s obvious that an aligned AGI would reduce other x-risks, but my guess is it probably would.)
I still think it’s weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it’s safe, we’re not anti-tech, etc.… but is “don’t give money to the company that’s doing horrible things” such a bad principle?
“I find Lockheed Martin’s continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I’m quite impressed...”
The argument people make is that LLMs improve the productivity of people’s safety research so it’s worth paying. That kinda makes sense. But I do think “don’t give money to the people doing bad things” is a strong heuristic.
I’m a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don’t Justify Means (Among Humans), he’s also written some (IMO stronger) arguments elsewhere but I don’t recall where.
Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I’d be rich enough to diversify EA funding away from Good Ventures.
I have previously paid for LLM subscriptions (I don’t have any currently) but I think I was not giving enough consideration to the “ends don’t justify means among humans” principle, so I will not buy any subscriptions in the future.
I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
I take your point, and it’s an important one, but I find your claim to not know what’s going on in these people’s heads to be too strong. I feel excited about some kinds of new evidence about “the potentially imminent demise of humanity” like the time horizon graph you mention because I had already priced in the risks this evidence points to and, the evidence just made it way more legible and makes it much easier to communicate my concerns (and getting the broader public and governments to understand this kind of thing seems paramount for safety).
This is especially true for researchers getting excited about publishing their own work because they’ve known their own results for months usually before they’ve published it and so publishing it just means they’re more legible while the updates are completely priced in.
I think there’s also a tendency I have in myself to feel much too happy when new evidence makes things I was worried about legible for the same reason I enjoy saying I-told-you-so when my friends make mistakes I warned them about even though I care about my friends and I would have preferred they didn’t make these mistakes. This is definitely a silly quirk of my brain but I don’t think it’s a big problem; it definitely doesn’t push me to cause the things I’m predicting to come to fruition in cases where that would be bad.
This is a good post, but it applies unrealistic standards and therefore draws too strong conclusions.
>And at least OpenAI and Anthropic have been caught lying about their motivations:
Just face it: It is very normal for big companies to lie. That does make many of their press and public facing statements not trustworthy, but is not predictive of their general value system and therefore actions. Plus Anthropic, unlike most labs, did in fact support a version of SB 1047 at all. That has to count for something.
>There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
In a similar vein, humans do not act or feel rationally in light of their beliefs, and changing your behavior completely in response to a years off event is just not in the cards for the vast majority of folks. Therefore do not be surprised that there is a missing mood, just like it is not surprising that people who genuinely believe in the end of humanity due to climate change do not adjust their behavior accordingly. Having said that, I did sense a general increase and preponderance of anxiety when o3 was announced, perhaps that was a point where it started to feel real for many folks. Either way, I really want to stress that concluding much about the beliefs of folks based on these reactions is very tenuous, just like concluding that a researcher must not really care about AI safety because instead of working a bit more they watch some TV in the evening.
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying.
If you’re not elderly or otherwise at risk of irreversible harms in the near future, then pausing for a decade (say) to reduce the chance of AI ruin by even just a few percentage points still seems good. So the crux is still “can we do better by pausing.” (This assumes pauses on the order of 2-20years; the argument changes for longer pauses.)
Maybe people think the background level of xrisk is higher than it used to be over the last decades because the world situation seems to be deteriorating. But IMO this also increases the selfishness aspect of pushing AI forward because if you’re that desperate for a deus ex machina, surely you also have to thihnk that there’s a good chance things will get worse when you push technology forward.
(Lastly, I also want to note that for people who care less about living forever and care more about near-term achievable goals like “enjoy life with loved ones,” the selfish thing would be to delay AI indefinitely because rolling the dice for a longer future is then less obvioiusly worth it.)
the level of selfishness required to seek immortality at the cost of risking all of humanity
If only you got immortality (or even you and a small handful of your loved ones), okay, yeah, that would be selfish. But if the expectation is that it soon becomes cheap and widely accessible, that’s just straight-up heroic.
I would not describe it as heroic. I think it’s approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.
This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.
(Also my P(doom) is higher than 20%, that’s just an example)
An important difference between the analogy you gave and our real situation is that non-Americans actually exist right now, whereas future human generations do not yet exist and they may never actually come into existence—they are merely potential. Their existence depends on the choices we make today. A closer analogy would be choosing an 80% chance of making all humans immortal and a 20% chance of eliminating the possibility of future space colonization. Framed this way, I don’t think the choice to take such a gamble should be considered selfish or even short-sighted, though I understand that many people would still not want to take that gamble.
cryonics is expensive, unpopular and unavailable in most countries of the world. This is also a situation where young and rich people in first world countries buy themselves a reduction probability of their own death, at the expense of a guaranteed deprivation of the chances of life of the poor and old people.
I agree with the top part. I think it’s naive to believe that AI is helping anyone, but what I want to talk about is why this problem might be unsolvable (except by avoiding it entirely).
If you hate something and attempt to combat it, you will get closer to it rather than further away, in the manner which people refer to when they say “You actually love what you say you hate”. When I say “don’t think about pink elephants”, the more you try, the more you will fail, and this is because the brain doesn’t have subtraction and division, but only addition and multiplication.
You cannot learn about how to defend yourself against a problem without learning how to also cause the problem. When you learn self-defense you will also learn attacks. You cannot learn how to argue effectively with people who hold stupid worldviews without first understanding them and thus creating a model of the worldview within yourself as well.
Due to mechanics like these, it may be impossible to research “AI safety” in isolation. It’s probably better to use a neutral word like “AI capabilities” which include both the capacity for harm and defense against harm so that we don’t mislead ourselves with words. It can cause untold damage, much like viewing “good and evil” as opposites, rather than two sides of the same thing, has.
I also want to warn everyone that there seems to be an asymmetry in warfare which makes it so that attacking is strictly easier than defending. This ratio seems to increase as technology improves.
I find it hard to trust that AI safety people really care about AI safety.
DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations:
OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn’t; too many more to list.
Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that’s not related to x-risk but it’s related to trustworthiness).
For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but:
Epoch is not about reducing x-risk, and they were explicit about this but I didn’t learn it until this week
its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad)
some of their researchers left to start another build-AGI startup (I’m not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities)
Director Jaime Sevilla believes “violent AI takeover” is not a serious concern, and “I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development”, and “on net I support faster development of AI, so we can benefit earlier from it” which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born)
I feel bad picking on Epoch/Jaime because they are being unusually forthcoming about their motivations, in a way that exposes them to criticism. This is noble of them; I expect most orgs not to be this noble.
When some other org does something that looks an awful lot like it’s accelerating capabilities, and they make some argument about how it’s good for safety, I can’t help but wonder if they secretly believe the same things as Epoch and are not being forthright about their motivations
My rough guess is for every transparent org like Epoch, there are 3+ orgs that are pretending to care about x-risk but actually don’t
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
Many AI safety people currently or formerly worked at AI companies. They stand to make money from accelerating AI capabilities. The same is true of grantmakers
I briefly looked thru some grantmakers and I see financial COIs at Open Philanthropy, Survival and Flourishing Fund, and Manifund; but none at Long-Term Future Fund
A different sort of conflict of interest: many AI safety researchers have an ML background and enjoy doing ML. Unsurprisingly, they often arrive at the belief that doing ML research is the best way to make AI safe. This ML research often involves making AIs better at stuff. Pausing AI development (or imposing significant restrictions) would mean they don’t get to do ML research anymore. If they oppose a pause/slowdown, is that for ethical reasons, or is it because it would interfere with their careers?
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation. I am appalled at the level of selfishness required to seek immortality at the cost of risking all of humanity. And I’m sure most people who hold this position know it’s appalling, so they keep it secret and publicly give rationalizations for why accelerating AI is actually the right thing to do. In a way, I admire people who are open about their selfishness, and I expect they are the minority.
But you should know that you might not be able to trust me either.
I have some moral uncertainty but my best guess is that future people are just as valuable as present-day people. You might think this leads me to put too much priority on reducing x-risk relative to helping currently-alive people. You might think it’s unethical that I’m willing to delay AGI (potentially hurting currently-alive people) to reduce x-risk.
I care a lot about non-human animals, and I believe it’s possible in principle to trade off human welfare against animal welfare. (Although if anything, I think that should make me care less about x-risk, not more.)
ETA: I am pretty pessimistic about AI companies’ plans for aligning ASI. My weakly held belief is that if companies follow their current plans, there’s a 2 in 3 chance of a catastrophic outcome. (My unconditional P(doom) is lower than that.) You might believe this makes me too pessimistic about certain kinds of strategies.
(edit: removed an inaccurate statement)
[1] ETA: I saw several examples of this on Twitter. Went back and looked and I couldn’t find the examples I recall seeing. IIRC they were mainly quote-tweets, not direct replies, and I don’t know how to find quote-tweets (the search function was unhelpful).
I think this is straightforwardly true and basically hard to dispute in any meaningful way. A lot of this is basically downstream of AI research being part of a massive market/profit generating endeavour (the broader tech industry), which straightforwardly optimises for more and more “capabilities” (of various kinds) in the name of revenue. Indeed, one could argue that long before the current wave of LLMs the tech industry was developing powerful agentic systems that actively worked to subvert human preferences in favour of disempowering them/manipulating them, all in the name of extracting revenue from intelligent work… we just called the AI system the Google/Facebook/Youtube/Twitter Algorithm.
The trend was always clear: an idealistic mission to make good use of global telecommunication/information networks finds initial success and is a good service. Eventually pressures to make profits cause the core service to be degraded in favour of revenue generation (usually ads). Eventually the company accrues enough shaping power to actively reshape the information network in its favour, and begins dragging everything down with it. In the face of this AI/LLMs are just another product to be used as a revenue engine in the digital economy.
AI safety, by its nature, resists the idea of creating powerful new information technologies to exploit mercilessly for revenue without care for downstream consequences. However, many actors in the AI safety movement are themselves tied to the digital economy, and depend on it for their power, status, and livelihoods. Thus, it is not that there are no genuine concerns being expressed, but that at every turn these concerns must be resolved in a way that keeps the massive tech machine going. Those who don’t agree with this approach are efficiently selected against. For example:
Race dynamics are bad?
Maybe we should slow down.We just need to join the race and be the more morally-minded actor. After all, there’s no stopping the race, we’re already locked in.Our competitors/other parties are doing dangerous things?
Maybe we could coordinate and share our concerns and research with them.We can’t fall behind, we’ve got to fly the AI safety flag at the conference of AI Superpowers. Let’s speed up too.New capabilities are unknown and jagged?
Let’s just leave well enough alone.Let’s invest more in R&D so we can safely understand and harness them.Here’s a new paradigm that might lead to a lot of risk and a lot of reward.
We should practice the virtue of silence and buy the world timeWe should make lots of noise so we can get funding. To study it. Not to use it, of course. Just to understand the safety implications.Maybe progress in AI is slower than we thought.
Hooray! Maybe we can chill for a bitThat’s time for us to exploit our superior AI knowledge and accelerate progress to our benefit.We’ve seen this before.
To be honest, though, I’m not sure what to do about this. So much has been invested by now that it truly feels like history is moving with a will of its own, rather than individuals steering the ship. Every time I look at what’s going on I feel the sense that maybe I’m just the idiot that hasn’t gotten the signal to hammer that “exploit” button. After all, it’s what everyone else is doing.
What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
I think many of these are predicated on the belief that it would be plausible to get everyone to pause now. In my opinion this is extremely hard and pretty unlikely to happen. I think that, even in worlds where actors continue to race, there are actions we can take to lower the probability of x-risk, and it is a reasonable position to do so.
I separately think that many of the actions you describe historically were dumb/harmful, but are equally consistent with “25% of safety people act like this” and 100%
Not GP but I’d guess maybe 10%. Seems worth it to try. IMO what they should do is hire a team of top negotiators to work full-time on making deals with other AI companies to coordinate and slow down the race.
ETA: What I’m really trying to say is I’m concerned Anthropic (or some other company) would put in a half-assed effort to cooperate and then give up, when what they should do is Try Harder. “Hire a team to work on it full time” is one idea for what Trying Harder might look like.
Fair. My probability is more like 1-2%. I do think that having a team of professional negotiators seems a reasonable suggestion though. I predict the Anthropic position would be that this is really hard to achieve in general, but that if slowing down was ever achieved we would need much stronger evidence of safety issues. In addition to all the commercial pressure, slowing down now could be considered to violate antitrust law. And it seems way harder to get all the other actors like Meta or DeepSeek or xAI on board, meaning I don’t even know if I think it’s good for some of the leading actors to unilaterally slow things down now (I predict mildly net good, but with massive uncertainty and downsides)
I think it’s important to distinguish between factual disagreements and moral disagreements. My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc. I’m much less sure the founders of Mechanize care.
And to whatever degree you trust my judgement/honesty, I work at DeepMind and reducing existential risk is a fairly large part of my motivation (though far from all of it), and I try to regularly think about how my team’s strategy can be better targeted towards this.
And I know a lot of safety people at deepmind and other AGI labs who I’m very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team’s current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is actually mattering for X-Risk, some dedicate a bunch of time to forming detailed mental models of how AI leads to bad outcomes and how this could be prevented and how their work fit in, etc. If people just wanted to do fun ml work, there’s a lot of other places Obviously people are complex. People are largely not motivated by a single thing and the various conflicts of interest you note seem real. I expect some of the people I’m thinking of. I’ve misjudged or say they care about X-Risk but actually don’t. But I would just be completely shocked if say half of them were not highly motivated by reducing x-. It’s generally more reasonable to be skeptical of the motivations of senior leadership, who have much messier incentives and constraints on their communication.
Regarding the missing mood thing, I think there’s something to what you’re saying, but also that it’s really psychologically unhealthy to work in a field which is constantly advancing like AI and every time there is an advance feel the true emotional magnitudes of what it would mean if existential risk has now slightly increased. If anyone did, I think they would burn out pretty fast, so the people left in the field largely don’t. I also think you should reserve those emotions for times when a trend is deviated from rather than when a trend is continued. In my opinion, the reason people were excited about the metr work was that it was measuring a thing that was already happening much more precisely, it was a really important question and reducing our confusion about that is high value. It wasn’t really capabilities work in my opinion
Is this implicitly assuming total utilitarianism? I certainly care about the future of humanity, but I reject moral views that say it is overwhelmingly the thing that matters and present day concerns round down to zero. I think many people have intuitions aligning with this.
I don’t think this is true. My sense is he views his current work as largely being good on non x-risk grounds, and thinks that even if it might slightly increase x-risk, he wouldn’t think it would be worth it for him to stop working on it, since he thinks it’s unfair to force the current generation to accept a slightly higher risk of not achieving longevity escape velocity and more material wealth in exchange for a small increase in existential risk.
He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
It is true that Jaime might think that AI x-risk could hypothetically be motivating to him, but at least my best interpretations of what is going on, suggest to me he de-facto does not consider it as an important input into his current strategic choices, or the choices of Epoch.
I think you’re strawmanning him somewhat
Does not seem a fair description of
People are allowed to have multiple values! If someone would trade a small amount of value A for a large amount of value B, this is entirely consistent with them thinking both are important.
Like, if you offer people the option to commit suicide in exchange for reducing x-risk by x%, what value of x do you think they would require? And would you say they are not x risk motivated if they eg aren’t willing to do it at 1e-6?
In practice this doesn’t really come up, so it’s not that relevant. Similarly for Jaime’s position, how much he believes himself to be in situations where he’s trading off meaningful harm to today and meaningful harm to the present generation seems very important.
I did a bit of digging, because these quotes seemed narrow to me. Here’s the original tweet of that tweet thread.
Then right after:
All said, this specific chain doesn’t give us a huge amount of information. It totals something like 10-20 sentences.
> He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
This seems like a major oversimplification to me. He says “I am concerned about concentration of power and gradual disempowerment. I put the probability that ai ends up being net bad for humans at 15%.” There is a cluster in the rationalist/EA community that believes that “gradual disempowerment” is an x-risk. Perhaps you wouldn’t define “concentration of power and gradual disempowerment” as technically an x-risk, but if so, that seems a bit like a technicality to me. It can clearly be a very major deal.
It sounds a lot to me that Jaime is very concerned about some aspects of AI risk but not others.
In the quote you reference, he clearly says, “Not that it should be my place to unilaterally make such a decision anyway.”. I hear him saying, “I disagree with the x-risk community about the issue of slowing down AI, specifically. However, I don’t think this disagreement a big concern, given that I also feel like it’s not right for me to personally push for AI to be sped up, and thus I won’t do it.”
I am not saying Jaime in-principle could not be motivated by existential risk from AI, but I do think the evidence suggests to me strongly that concerns about existential risk from AI are not among the primary motivations for his work on Epoch (which is what I understood Neel to be saying).
Maybe it is because he sees the risk as irreducible, maybe it is because the only ways of improving things would cause collateral damage for other things he cares about. I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.
My sense is that Jaime’s view (and Epoch’s view more generally) is more like: “making people better informed about AI in a way that is useful to them seems heuristically good (given that AI is a big deal), it doesn’t seem that useful or important to have a very specific theory of change beyond this”. From this perspective, saying “concerns about existential risk from AI are not among the primary motivations” is partially slightly confused as the heuristic isn’t necessarily back chained from any more specific justification. Like there is no specific terminal motivation.
Like consider someone who donates to Give Directly due to “idk, seems heuristically good to empower the worst off people” and someone who generally funds global health and well being due to specifically caring about ongoing human welfare (putting aside AI for now). This heuristic is partially motived via flow through from caring about something like welfare even though it doesn’t directly show up. These people seem like natural allies to me except in surprising circumstances (e.g., it turns out the worst off people use marginal money/power in a way that is net negative for human welfare).
I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:
“Not 100% motivated by it” IMO sounds like an implication that “being motivated by reducing x-risk would make up something like 30%-70% of the motivation”. I don’t think that’s true, and I think various things that Jaime has said make that relatively clear.
I think you’re conflating “does not think that slowing down AI obviously reduces x-risk” with “reducing x risk is not a meaningful motivation for his work”. Jaime has clearly said that he believes x risk is a real and >=15% (though via different mechanisms to loss of control). I think that the public being well informed about AI generally reduces risk, and I think that Epoch is doing good work on this front, and that increasing the probability that AI goes well is part of why Jaime works on this. I think it’s much less clear if Frontier Math was good, but Jaime wasn’t very involved anyway, so doesn’t seem super relevant.
I basically think the only thing he’s said that you could consider objectionable is that he’s reluctant to push for a substantial pause for AI since x risk is not the only thing he cares about. But he also (sincerely, imo) expresses uncertainty about whether such a pause WOULD be good for x risk
There are a few questions here.
1. Do Jaime’s writings that that he cares about x-risk or not?
→ I think he fairly clearly states that cares.
2. Does all the evidence, when put together, imply that actually, Jaime doesn’t care about x-risk?
→ This is a much more speculative question. We have to assess how honest he is in his writing. I’d bet money that Jaime at least believes that he cares and is taking corresponding actions. This of course doesn’t absolve him of full responsibility—there are many people who believe they do things for good reasons, but causally actually do things for selfish reasons. But now we’re getting to a particularly speculative area.
“I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.” → Again, to me, I regard him as basically claiming that he does care. I’d bet money that if we ask him to clarify, he’d claim that he cares. (Happy to bet on this, if that would help)
At the same time, I doubt that this is your actual crux. I’d expect that even if he claimed (more precisely) to care, you’d still be skeptical of some aspect of this.
---
Personally, I have both positive and skeptical feelings about Epoch, as I do other evals orgs. I think they’re doing some good work, but I really wish they’d lean a lot more on [clearly useful for x-risk] work. If I had a lot of money to donate, I could picture donating some to Epoch, but only if I could get a lot of assurances on which projects it would go to.
But while I have reservations about the org, I think some of the specific attacks against them (and defenses or them) are not accurate.
People’s “deep down motivations” and “endorsed upon reflection values,” etc, are not the only determiners of what they end up doing in practice re influencing x-risk.
I agree with that. I was responding specifically to this:
In that case I think your response is a non sequitur, since clearly “really care” in this context means “determiners of what they end up doing in practice re influencing x-risk”.
I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.
I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety
People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
This looks closer to 2 to me?
Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
Oh man — I sure hope making ‘defectors’ and lab safety staff walk the metaphorical plank isn’t on the table. Then we’re really in trouble.
My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.
At the risk of seeming quite combative, when you say
That’s basically what I mean when I said in my comment
And, after thinking about it, I don’t see your statement conflicting with mine.
If this is you: Please just sign up for cryonics. It’s a much better immortality gambit than rushing for ASI.
This seems not to be true assuming a P(doom) of 25% and a purely selfish perspective, or even a moderately altruistic perspective which places most of its weight on, say, the person’s immediate family and friends.
Of course any cryonics-free strategy is probably dominated by that same strategy plus cryonics for a personal bet at immortality, but when it comes to friends and family it’s not easy to convince people to sign up for cryonics! But immortality-maxxing for one’s friends and family almost definitely entails accelerating AI even at pretty high P(doom)
(And that’s without saying that this is very likely to not be the true reason for these people’s actions. It’s far more likely to be local-perceived-status-gradient-climbing followed by a post-hoc rationalization (which can also be understood as a form of local-perceived-status-gradient-climbing) and signing up for cryonics doesn’t really get you any status outside of the deepest depths of the rat-sphere, which people like this are obviously not in since they’re gaining status from accelerating AI)
The more sacrifices someone has made, the easier it is to believe that they mean what they say.
Kokotajlo gave up millions to say what he wants, so I trust he is earnest. People who have gotten arrested at Stop AI have spent time in jail for their beliefs, so I trust they are earnest.
It doesn’t mean these people are most useful for AI safety but on the subject of trust I know no better measurement than sacrifice.
Note that any competent capital holder has significant conflict of interest with AI, AI is already a significant fraction of the stock market and a pause would bring down most capital, not just private lab equity
Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
Otherwise yeah I’m on board with mood of your post.
Personally I really like doing math/philosophy and I have convinced myself that it is necessary to avert doom. At least I’m not accelerating progress much!
Ah you’re right, I wasn’t thinking about that. (Well I don’t think it’s obvious that an aligned AGI would reduce other x-risks, but my guess is it probably would.)
I still think it’s weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it’s safe, we’re not anti-tech, etc.… but is “don’t give money to the company that’s doing horrible things” such a bad principle?
“I find Lockheed Martin’s continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I’m quite impressed...”
The argument people make is that LLMs improve the productivity of people’s safety research so it’s worth paying. That kinda makes sense. But I do think “don’t give money to the people doing bad things” is a strong heuristic.
I’m a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don’t Justify Means (Among Humans), he’s also written some (IMO stronger) arguments elsewhere but I don’t recall where.
Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I’d be rich enough to diversify EA funding away from Good Ventures.
I have previously paid for LLM subscriptions (I don’t have any currently) but I think I was not giving enough consideration to the “ends don’t justify means among humans” principle, so I will not buy any subscriptions in the future.
I take your point, and it’s an important one, but I find your claim to not know what’s going on in these people’s heads to be too strong. I feel excited about some kinds of new evidence about “the potentially imminent demise of humanity” like the time horizon graph you mention because I had already priced in the risks this evidence points to and, the evidence just made it way more legible and makes it much easier to communicate my concerns (and getting the broader public and governments to understand this kind of thing seems paramount for safety).
This is especially true for researchers getting excited about publishing their own work because they’ve known their own results for months usually before they’ve published it and so publishing it just means they’re more legible while the updates are completely priced in.
I think there’s also a tendency I have in myself to feel much too happy when new evidence makes things I was worried about legible for the same reason I enjoy saying I-told-you-so when my friends make mistakes I warned them about even though I care about my friends and I would have preferred they didn’t make these mistakes. This is definitely a silly quirk of my brain but I don’t think it’s a big problem; it definitely doesn’t push me to cause the things I’m predicting to come to fruition in cases where that would be bad.
This is a good post, but it applies unrealistic standards and therefore draws too strong conclusions.
>And at least OpenAI and Anthropic have been caught lying about their motivations:
Just face it: It is very normal for big companies to lie. That does make many of their press and public facing statements not trustworthy, but is not predictive of their general value system and therefore actions. Plus Anthropic, unlike most labs, did in fact support a version of SB 1047 at all. That has to count for something.
>There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
In a similar vein, humans do not act or feel rationally in light of their beliefs, and changing your behavior completely in response to a years off event is just not in the cards for the vast majority of folks. Therefore do not be surprised that there is a missing mood, just like it is not surprising that people who genuinely believe in the end of humanity due to climate change do not adjust their behavior accordingly. Having said that, I did sense a general increase and preponderance of anxiety when o3 was announced, perhaps that was a point where it started to feel real for many folks.
Either way, I really want to stress that concluding much about the beliefs of folks based on these reactions is very tenuous, just like concluding that a researcher must not really care about AI safety because instead of working a bit more they watch some TV in the evening.
If you’re not elderly or otherwise at risk of irreversible harms in the near future, then pausing for a decade (say) to reduce the chance of AI ruin by even just a few percentage points still seems good. So the crux is still “can we do better by pausing.” (This assumes pauses on the order of 2-20years; the argument changes for longer pauses.)
Maybe people think the background level of xrisk is higher than it used to be over the last decades because the world situation seems to be deteriorating. But IMO this also increases the selfishness aspect of pushing AI forward because if you’re that desperate for a deus ex machina, surely you also have to thihnk that there’s a good chance things will get worse when you push technology forward.
(Lastly, I also want to note that for people who care less about living forever and care more about near-term achievable goals like “enjoy life with loved ones,” the selfish thing would be to delay AI indefinitely because rolling the dice for a longer future is then less obvioiusly worth it.)
If only you got immortality (or even you and a small handful of your loved ones), okay, yeah, that would be selfish. But if the expectation is that it soon becomes cheap and widely accessible, that’s just straight-up heroic.
I would not describe it as heroic. I think it’s approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.
This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.
(Also my P(doom) is higher than 20%, that’s just an example)
An important difference between the analogy you gave and our real situation is that non-Americans actually exist right now, whereas future human generations do not yet exist and they may never actually come into existence—they are merely potential. Their existence depends on the choices we make today. A closer analogy would be choosing an 80% chance of making all humans immortal and a 20% chance of eliminating the possibility of future space colonization. Framed this way, I don’t think the choice to take such a gamble should be considered selfish or even short-sighted, though I understand that many people would still not want to take that gamble.
cryonics is expensive, unpopular and unavailable in most countries of the world. This is also a situation where young and rich people in first world countries buy themselves a reduction probability of their own death, at the expense of a guaranteed deprivation of the chances of life of the poor and old people.
I agree with the top part. I think it’s naive to believe that AI is helping anyone, but what I want to talk about is why this problem might be unsolvable (except by avoiding it entirely).
If you hate something and attempt to combat it, you will get closer to it rather than further away, in the manner which people refer to when they say “You actually love what you say you hate”. When I say “don’t think about pink elephants”, the more you try, the more you will fail, and this is because the brain doesn’t have subtraction and division, but only addition and multiplication.
You cannot learn about how to defend yourself against a problem without learning how to also cause the problem. When you learn self-defense you will also learn attacks. You cannot learn how to argue effectively with people who hold stupid worldviews without first understanding them and thus creating a model of the worldview within yourself as well.
Due to mechanics like these, it may be impossible to research “AI safety” in isolation. It’s probably better to use a neutral word like “AI capabilities” which include both the capacity for harm and defense against harm so that we don’t mislead ourselves with words. It can cause untold damage, much like viewing “good and evil” as opposites, rather than two sides of the same thing, has.
I also want to warn everyone that there seems to be an asymmetry in warfare which makes it so that attacking is strictly easier than defending. This ratio seems to increase as technology improves.
When you say ~zero value, do you mean hyperbolically dicounted or something more extreme?