MichaelDickens’s Shortform

MichaelDickens18 Oct 2021 18:26 UTC

2 points

141 comments1 min readLW link

MichaelDickens 2 Oct 2024 4:11 UTC
158 points
68
I get the sense that we can’t trust Open Philanthropy to do a good job on AI safety, and this is a big problem. Many people would have more useful things to say about this than I do, but I still feel that I should say something.

My sense comes from:
- Open Phil is reluctant to do anything to stop the companies that are doing very bad things to accelerate the likely extinction of humanity, and is reluctant to fund anyone who’s trying to do anything about it.
- People at Open Phil have connections with people at Anthropic, a company that’s accelerating AGI and has a track record of (plausibly-deniable) dishonesty. Dustin Moskovitz has money invested in Anthropic, and Open Phil employees might also stand to make money from accelerating AGI. And I agree with Bryan Caplan’s recent take that friendships are often a bigger conflict of interest than money, so Open Phil higher-ups being friends with Anthropic higher-ups is troubling.
A lot of people (including me as of ~one year ago) consider Open Phil the gold standard for EA-style analysis. I think Open Phil is actually quite untrustworthy on AI safety (but probably still good on other causes).

I don’t know what to do with this information.
What links here?
- JWS 🔸's comment on Upcoming changes to Open Philanthropy’s university group funding by Eli Rose🔸 (EA Forum; 13 Dec 2024 12:12 UTC; 40 points)
- Ebenezer Dukakis's comment on Aaron Bergman’s Quick takes by Aaron Bergman (EA Forum; 5 Oct 2024 8:53 UTC; 14 points)
- habryka 2 Oct 2024 5:06 UTC
  228 points
  77
  Parent
  Epistemic status: Speculating about adversarial and somewhat deceptive PR optimization, which is inherently very hard and somewhat paranoia inducing. I am quite confident of the broad trends here, but it’s definitely more likely that I am getting things wrong here than in other domains where evidence is more straightforward to interpret, and people are less likely to shape their behavior in ways that includes plausible deniability and defensibility.
  I agree with this, but I actually think the issues with Open Phil are substantially broader. As a concrete example, as far as I can piece together from various things I have heard, Open Phil does not want to fund anything that is even slightly right of center in any policy work. I don’t think this is because of any COIs, it’s because Dustin is very active in the democratic party and doesn’t want to be affiliated with anything that is right-coded. Of course, this has huge effects by incentivizing polarization of AI policy work with billions of dollars, since any AI Open Phil funded policy organization that wants to engage with people on the right might just lose all of their funding because of that, and so you can be confident they will steer away from that.
  Open Phil is also very limited in what they can say about what they can or cannot fund, because that itself is something that they are worried will make people annoyed with Dustin, which creates a terrible fog around how OP is thinking about stuff.^[1]
  Honestly, I think there might no longer a single organization that I have historically been excited about that OpenPhil wants to fund. MIRI could not get OP funding, FHI could not get OP funding, Lightcone cannot get OP funding, my best guess is Redwood could not get OP funding if they tried today (though I am quite uncertain of this), most policy work I am excited about cannot get OP funding, the LTFF cannot get OP funding, any kind of intelligence enhancement work cannot get OP funding, CFAR cannot get OP funding, SPARC cannot get OP funding, FABRIC (ESPR etc.) and Epistea (FixedPoint and other Prague-based projects) cannot get OP funding, not even ARC is being funded by OP these days (in that case because of COIs between Paul and Ajeya).^[2] I would be very surprised if Wentworth’s work, or Wei Dai’s work, or Daniel Kokotajlo’s work, or Brian Tomasik’s work could get funding from them these days. I might be missing some good ones, but the funding landscape is really quite thoroughly fucked in that respect. My best guess is Scott Alexander could not get funding, but I am not totally sure.^[3]
  I cannot think of anyone who I would credit with the creation or shaping of the field of AI Safety or Rationality who could still get OP funding. Bostrom, Eliezer, Hanson, Gwern, Tomasik, Kokotajlo, Sandberg, Armstrong, Jessicata, Garrabrant, Demski, Critch, Carlsmith, would all be unable to get funding^[4] as far as I can tell. In as much as OP is the most powerful actor in the space, the original geeks are being thoroughly ousted.^[5]
  In-general my sense is if you want to be an OP longtermist grantee these days, you have to be the kind of person that OP thinks is not and will not be a PR risk, and who OP thinks has “good judgement” on public comms, and who isn’t the kind of person who might say weird or controversial stuff, and is not at risk of becoming politically opposed to OP. This includes not annoying any potential allies that OP might have, or associating with anything that Dustin doesn’t like, or that might strain Dustin’s relationships with others in any non-trivial way.
  Of course OP will never ask you to fit these constraints directly, since that itself could explode reputationally (and also because OP staff themselves seem miscalibrated on this and do not seem in-sync with their leadership). Instead you will just get less and less funding, or just be defunded fully, if you aren’t the kind of person who gets the hint that this is how the game is played now.
  And to provide some pushback on things you say, I think now that OPs bridges with OpenAI are thoroughly burned after the Sam firing drama, OP is pretty OK with people criticizing OpenAI (since what social capital is there left to protect here?). My sense is criticizing Anthropic is slightly risky, especially if you do it in a way that doesn’t signal what OP considers good judgement on maintaining and spending your social capital appropriately (i.e. telling them that they are harmful for the world, or should really stop, is bad, but doing a mixture of praise and criticism without taking any controversial top-level stance is fine), but mostly also isn’t the kind of thing that OP will totally freak out about. I think OP used to be really crazy about this, but now is a bit more reasonable, and it’s not the domain where OP’s relationship to reputation-management is causing the worst failures.
  I think all of this is worse in the longtermist space, though I am not confident. At the present it wouldn’t surprise me very much if OP would defund a global health grantee because their CEO endorsed Trump for president, so I do think there is also a lot of distortion and skew there, but my sense is that it’s less, mostly because the field is much more professionalized and less political (though I don’t know how they think, for example, about funding on corporate campaign stuff which feels like it would be more political and invite more of these kinds of skewed considerations).
  Also, to balance things, sometimes OP does things that seem genuinely good to me. The lead reduction fund stuff seems good, genuinely neglected, and I don’t see that many of these dynamics at play there (I do also genuinely care about it vastly less than OPs effect on AI Safety and Rationality things).
  1. ^
    As an example of this, in the announcement of recent Open Phil defunding decisions, they communicated that they are “withdrawing funding from some cause areas”, which was really a quite misleading way to describe what actually happened and basically no one I talked to who read the post understood what happened correctly.
    
    If you “withdraw from a cause area” you would expect that if you have an organization that does good work in multiple cause areas, then you would expect you would still fund the organization for work in cause areas that funding wasn’t withdrawn from. However, what actually happened is that Open Phil blacklisted a number of ill-defined broad associations and affiliations, where if you are associated with a certain set of ideas, or identities or causes, then no matter how cost-effective your other work is, you cannot get funding from OP. This is of course an enormously different situations, with an enormously different set of incentives. But as far as I can tell all the comms here was dictated by Dustin, and nobody from OP felt comfortable clarifying what is actually happening in public, and so tons of people were misled about what actually happened.
  2. ^
    To elaborate on this a bit: Mostly ARC and METR have been trying to avoid taking Open Phil money due to minimizing conflicts of interest. My guess is if those COIs were not a concern they could get funding. But as a result, a really huge fraction of the non-OP funding has been going to ARC and METR in a way that has substantially contributed to centralizing funding decisions under OP, making the effects of this case quite similar to all the other cases (though with different dynamics involved).
  3. ^
    (Edit: Two organizations that came to mind that OP has funded somewhat recently that I do think are good are FAR AI and AI Impacts. My sense is that FAR AI was quite unhappy with the pressures OP kept putting on them, with them pushing them away from doing what they considered important intellectual work, but instead do more talent funnel building, and they are less dependent on OP funding than others these days, and I am also not confident they could get more funding for things like FAR Labs if they applied today. I do think them funding AI Impacts is good, but also wouldn’t be that surprised to see that end because for example AI Impacts wants to think about AI sentience and OP cannot fund orgs that do that kind of work.)
  4. ^
    (Edit: “Unable to get funding” meaning here “unable to get funding for the work that they think is most important”. I think many of these people are smart and of course could just subjugate their labor to OP themselves or some other entity that OP approves of, but that of course mostly makes the situation worse, especially in that they would be paid an order of magnitude less than their expected market wage if they did so)
  5. ^
    (Edit: As I mention above, Paul is a weird case. He could very likely get funding, but has been avoiding taking it for other reasons. Now editing this a few hours after it was originally written, I think Carl Shulman is also someone who could get funding, and I do think is in the same reference class as the others here, and so is at least one counterexample, though I am not confident on Carl.)
  What links here?
  - Orpheus16 3 Oct 2024 2:57 UTC
    94 points
    29
    Parent
    Adding my two cents as someone who has a pretty different lens from Habryka but has still been fairly disappointed with OpenPhil, especially in the policy domain.
    Relative to Habryka, I am generally more OK with people “playing politics”. I think it’s probably good for AI safely folks to exhibit socially-common levels of “playing the game”– networking, finding common ground, avoiding offending other people, etc. I think some people in the rationalist sphere have a very strong aversion to some things in this genre, and labels like “power-seeking” and “deceptive” get thrown around too liberally. I also think I’m pretty with OpenPhil deciding it doesn’t want to fund certain parts of the rationalist ecosystem (and probably less bothered than Habryka about how their comms around this wasn’t direct/clear).
    In that sense, I don’t penalize OP much for trying to “play politics” or for breaking deontological norms. Nonetheless, I still feel pretty disappointed with them, particularly for their impact on comms/policy. Some thoughts here:
    I agree with Habryka that it is quite bad that OP is not willing to fund right-coded things. Even many of the “bipartisan” things funded by OP are quite left-coded. (As a useful heuristic, whenever you hear of someone launching a bipartisan initiative, I think one should ask “what % of the staff of this organization is Republican?” Obviously just a heuristic– there are some cases in which a 90%-Dem staff can actually truly engage in “real” bipartisan efforts. But in some cases, you will have a 90%-Dem staff claiming to be interested in bipartisan work without any real interest in Republican ideas, few if any Republican contacts, and only a cursory understanding of Republican stances.)
    I also agree with Habryka that OP seems overly focused on PR risks and not doing things that are weird/controversial. “To be a longtermist grantee these days you have to be the kind of person that OP thinks is not and will not be a PR risk, IE will not say weird or controversial stuff” sounds pretty accurate to me. OP cannot publicly admit this because this would be bad for its reputation– instead, it operates more subtly.
    Separately, I have seen OpenPhil attempt to block or restrain multiple efforts in which people were trying to straightforwardly explain AI risks to policymakers. My understanding is that OpenPhil would say that they believed the messengers weren’t the right people (e.g., too inexperienced), and they thought the downside risks were too high. In practice, there are some real tradeoffs here: there are often people who seem to have strong models of AGI risk but little/no policy experience, and sometimes people who have extensive policy experience but only recently started engaging with AI/AGI issues. With that in mind, I think OpenPhil has systematically made poor tradeoffs here and failed to invest into (or in some cases, actively blocked) people who were willing to be explicit about AGI risks, loss of control risks, capability progress, and the need for regulation. (I also think the “actively blocking” thing has gotten less severe over time, perhaps in part because OpenPhil changed its mind a bit on the value of direct advocacy or perhaps because OpenPhil just decided to focus its efforts on things like research and advocacy projects found funding elsewhere.)
    I think OpenPhil has an intellectual monoculture and puts explicit/implicit cultural pressure on people in the OP orbit to “stay in line.” There is a lot of talk about valuing people who can think for themselves, but I think the groupthink problems are pretty real. There is a strong emphasis on “checking-in” with people before saying/doing things, and the OP bubble is generally much more willing to criticize action than inaction. I suspect that something like the CAIS statement or even a lot of the early Bengio comms would not have occured if Dan Hendrycks or Yoshua were deeply ingrained in the OP orbit. It is both the case that they would’ve been forced to write 10+ page Google Docs defending their theories of change and the case that the intellectual culture simply wouldn’t have fostered this kind of thinking.
    I think the focus on evals/RSPs can largely be explained by a bias toward trusting labs. OpenPhil steered a lot of talent toward the evals/RSPs theory of change (specifically, if I recall correctly, OpenPhil leadership on AI was especially influential in steering a lot of the ecosystem to support and invest in the evals/RSPs theory of change.) I expect that when we look back in a few years, there will be a pretty strong feeling that this was the wrong call & that this should’ve been more apparent even without the benefit of hindsight.
    I would be more sympathetic to OpenPhil in a world where their aversion to weirdness/PR risks resulted in them having a strong reputation, a lot of political capital, and real-world influence that matched the financial resources they possess. Sadly, I think we’re in a “lose-lose” world: OpenPhil’s reputation tends to be poor in many policy/journalism circles even while OpenPhil pursues a strategy that seems to be largely focused on avoiding PR risks. I think some of this is unjustified (e.g., a result of propaganda campaigns designed to paint anyone who cares about AI risk as awful). But then some of it actually is kind of reasonable (e.g., impartial observers viewing OpenPhil as kind of shady, not direct in its communications, not very willing to engage directly or openly with policymakers or journalists, having lots of conflicts of interests, trying to underplay the extent to which its funding priorities are influenced/constrained by a single Billionaire, being pretty left-coded, etc.)
    To defend OpenPhil a bit, I do think it’s quite hard to navigate trade-offs and I think sometimes people don’t seem to recognize these tradeoffs. In AI policy, I think the biggest tradeoff is something like “lots of people who have engaged with technical AGI arguments and AGI threat models don’t have policy experience, and lots of people who have policy experience don’t have technical expertise or experience engaging with AGI threat models” (this is a bit of an oversimplification– there are some shining stars who have both.)
    I also think OpenPhil folks probably tend to have a different probability distribution over threat models (compared to me and probably also Habryka). For instance, it seems likely to me that OpenPhil employees operate in more of a “there are a lot of ways AGI could play out and a lot of uncertainty– we just need smart people thinking seriously about the problem. And who really know how hard alignment will be, maybe Anthropic will just figure it out” lens and less of a “ASI is coming and our priority needs to be making sure humanity understands the dangers associated with a reckless race toward ASI, and there’s a substantial chance that we are seriously not on track to solve the necessary safety and security challenges unless we fundamentally reorient our whole approach” lens.
    And finally, I think despite these criticisms, OpenPhil is also responsible for some important wins (e.g., building the field, raising awareness about AGI risk on university campuses, funding some people early on before AI safety was a “big deal”, jumpstarting the careers of some leaders in the policy space [especially in the UK]. It’s also plausible to me that there are some cases in which OpenPhil gatekeeping was actually quite useful in preventing people from causing harm, even though I probably disagree with OpenPhil about the # and magnitude of these cases).
    What links here?
    Where I Am Donating in 2024 by MichaelDickens (EA Forum; 19 Nov 2024 0:09 UTC; 181 points)
  - johnswentworth 2 Oct 2024 16:36 UTC
    88 points
    82
    Parent
    This should be a top-level post.
    - MichaelDickens 4 Oct 2024 22:42 UTC
      8 points
      0
      Parent
      What are the norms here? Can I just copy/paste this exact text and put it into a top-level post? I got the sense that a top-level post should be more well thought out than this but I don’t actually have anything else useful to say. I would be happy to co-author a post if someone else thinks they can flesh it out.
      
      Edit: Didn’t realize you were replying to Habryka, not me. That makes more sense.
  - Jackson Wagner 4 Oct 2024 22:26 UTC
    27 points
    22
    Parent
    It feels sorta understandable to me (albeit frustrating) that OpenPhil faces these assorted political constraints. In my view this seems to create a big unfilled niche in the rationalist ecosystem: a new, more right-coded, EA-adjacent funding organization could optimize itself for being able to enter many of those blacklisted areas with enthusiasm.
    
    If I was a billionare, I would love to put together a kind of “completion portfolio” to complement some of OP’s work. Rationality community building, macrostrategy stuff, AI-related advocacy to try and influence republican politicians, plus a big biotechnology emphasis focused on intelligence enhancement, reproductive technologies, slowing aging, cryonics, gene drives for eradicating diseases, etc. Basically it seems like there is enough edgy-but-promising stuff out there (like studying geoengineering for climate, or advocating for charter cities, or just funding oddball substack intellectuals to do their thing) that you could hope to create a kind of “alt-EA” (obviously IRL it shouldn’t have EA in the name) where you batten down the hatches, accept that the media will call you an evil villain mastermind forever, and hope to create a kind of protective umbrella for all the work that can’t get done elsewhere. As a bonus, you could engage more in actual politics (like having some hot takes on the US budget deficit, or on how to increase marriage & fertility rates, or whatever), in some areas that OP in its quest for center-left non-polarization can’t do.
    
    Peter Thiel already lives this life, kinda? But his model seems 1. much more secretive, and 2. less directly EA-adjacent, than what I’d try if I was a billionare.
    
    Dustin himself talks about how he is really focused on getting more “multipolarity” to the EA landscape, by bringing in other high-net-worth funders. For all the reasons discussed, he obviously can’t say “hi, somebody please start an edgier right-wing offshoot of EA!!” But it seems like a major goal that the movement should have, nonetheless.
    
    Seems like you could potentially also run this play with a more fully-left-coded organization. The gains there would probably be smaller, since there’s less “room” to OP’s left than to their right. But maybe you could group together wild animal welfare, invertebrate welfare, digital minds, perhaps some David Pearce / Project Far Out-style “suffering abolition” transhumanist stuff, other mental-wellbeing stuff like the Organization for the Prevention of Intense Suffering, S-risk work, etc. Toss in some more aggressive political activism on AI (like PauseAI) and other issues (like Georgist land value taxation), and maybe some forward-looking political stuff on avoiding stable totalitarianism, regulation of future AI-enabled technologies, and how to distribute the gains from a positive / successful singularity (akin to Sam Altman’s vision of UBI supported by georgist/pigouvian taxes, but more thought-through and detailed and up-to-date.)
    
    Finding some funders to fill these niches seems like it should be a very high priority of the rationalist / EA movement. Even if the funders were relatively small at first (like say they have $10M - $100M in crypto that they are preparing to give away), I think there could be a lot of value in being “out and proud” (publicising much of their research and philosophy and grantmaking like OP, rather than being super-secretive like Peter Thiel). If a small funder manages to build a small successful “alt-EA” ecosystem on either the left or right, that might attract larger funders in time.
  - Buck 2 Oct 2024 14:41 UTC
    23 points
    2
    Parent
    not even ARC has been able to get OP funding (in that case because of COIs between Paul and Ajeya)
    As context, note that OP funded ARC in March 2022.
    - habryka 2 Oct 2024 15:28 UTC
      13 points
      0
      Parent
      I think OP has funded almost everyone I have listed here in 2022 (directly or indirectly), so I don’t really think that is evidence of anything (though it is a bit more evidence for ARC because it means the COI is overcomable).
      - David Hornbein 6 Oct 2024 20:35 UTC
        7 points
        1
        Parent
        Hm, this timing suggests the change could be a consequence of Karnofsky stepping away from the organization.
        Which makes sense, now that I think about it. He’s by far the most politically strategic leader Open Philanthropy has had, so with him gone, it’s not shocking they might revert towards standard risk-averse optionality-maxxing foundation behavior.
  - David Matolcsi 2 Oct 2024 7:20 UTC
    14 points
    6
    Parent
    Isn’t it just the case that OpenPhil just generally doesn’t fund that many technical AI safety things these days? If you look at OP’s team on their website, they have only two technical AI safety grantmakers. Also, you list all the things OP doesn’t fund, but what are the things in technical AI safety that they do fund? Looking at their grants, it’s mostly MATS and METR and Apollo and FAR and some scattered academics I mostly haven’t heard of. It’s not that many things. I have the impression that the story is less like “OP is a major funder in technical AI safety, but unfortunately they blacklisted all the rationalist-adjacent orgs and people” and more like “AI safety is still a very small field, especially if you only count people outside the labs, and there are just not that many exciting funding opportunities, and OpenPhil is not actually a very big funder in the field”.
    - Buck 2 Oct 2024 14:51 UTC
      21 points
      14
      Parent
      A lot of OP’s funding to technical AI safety goes to people outside the main x-risk community (e.g. applications to Ajeya’s RFPs).
    - habryka 2 Oct 2024 7:34 UTC
      12 points
      3
      Parent
      Open Phil is definitely by far the biggest funder in the field. I agree that their technical grantmaking has been a limited over the past few years (though still on the order of $50M/yr, I think), but they also fund a huge amount of field-building and talent-funnel work, as well as a lot of policy stuff (I wasn’t constraining myself to technical AI Safety, the people listed have been as influential, if not more, on public discourse and policy).
      AI Safety is still relatively small, but more like $400M/yr small. The primary other employers/funders in the space these days are big capability labs. As you can imagine, their funding does not have great incentives either.
      - David Matolcsi 2 Oct 2024 7:58 UTC
        6 points
        2
        Parent
        Yeah, I agree, and I don’t know that much about OpenPhil’s policy work, and their fieldbuilding seems decent to me, though maybe not from you perspective. I just wanted to flag that many people (including myself until recently) overestimate how big a funder OP is in technical AI safety, and I think it’s important to flag that they actually have pretty limited scope in this area.
        habryka 2 Oct 2024 8:06 UTC
        5 points
        2
        Parent
        Yep, agree that this is a commonly overlooked aspect (and one that I think sadly has also contributed to the dominant force in AI Safety researchers becoming the labs, which I think has been quite sad).
  - Xodarap 2 Oct 2024 18:32 UTC
    11 points
    0
    Parent
    
    what actually happened is that Open Phil blacklisted a number of ill-defined broad associations and affiliations
    
    is there a list of these somewhere/details on what happened?
    - habryka 2 Oct 2024 19:10 UTC
      55 points
      16
      Parent
      You can see some of the EA Forum discussion here: https://forum.effectivealtruism.org/posts/foQPogaBeNKdocYvF/linkpost-an-update-from-good-ventures?commentId=RQX56MAk6RmvRqGQt
      The current list of areas that I know about are:
      Anything to do with the rationality community (“Rationality community building”)
      Anything to do with moral relevance of digital minds
      Anything to do with wild animal welfare and invertebrate welfare
      Anything to do with human genetic engineering and reproductive technology
      Anything that is politically right-leaning
      There are a bunch of other domains where OP hasn’t had an active grantmaking program but where my guess is most grants aren’t possible:
      Most forms of broad public communication about AI (where you would need to align very closely with OP goals to get any funding)
      Almost any form of macrostrategy work of the kind that FHI used to work on (i.e. Eternity in Six Hours and stuff like that)
      Anything about acausal trade of cooperation in large worlds (and more broadly anything that is kind of weird game theory)
      - Neel Nanda 4 Oct 2024 20:19 UTC
        12 points
        1
        Parent
        Huh, are there examples of right leaning stuff they stopped funding? That’s new to me
      - Xodarap 3 Oct 2024 3:32 UTC
        6 points
        0
        Parent
        You said
        
        If you “withdraw from a cause area” you would expect that if you have an organization that does good work in multiple cause areas, then you would expect you would still fund the organization for work in cause areas that funding wasn’t withdrawn from. However, what actually happened is that Open Phil blacklisted a number of ill-defined broad associations and affiliations, where if you are associated with a certain set of ideas, or identities or causes, then no matter how cost-effective your other work is, you cannot get funding from OP
        
        I’m wondering if you have a list of organizations where Open Phil would have funded their other work, but because they withdrew from funding part of the organization they decided to withdraw totally.
        
        This feels very importantly different from good ventures choosing not to fund certain cause areas (and I think you agree, which is why you put that footnote).
        habryka 3 Oct 2024 4:08 UTC
        15 points
        −2
        Parent
        I don’t have a long list, but I know this is true for Lightcone, SPARC, ESPR, any of the Czech AI-Safety/Rationality community building stuff, and I’ve heard a bunch of stories since then from other organizations that got pretty strong hints from Open Phil that if they start working in an area at all, they might lose all funding (and also, the “yes, it’s more like a blacklist, if you work in these areas at all we can’t really fund you, though we might make occasional exceptions if it’s really only a small fraction of what you do” story was confirmed to me by multiple OP staff, so I am quite confident in this, and my guess is OP staff would be OK with confirming to you as well if you ask them).
        Xodarap 3 Oct 2024 15:20 UTC
        1 point
        0
        Parent
        Thanks!
  - evhub 2 Oct 2024 21:42 UTC
    9 points
    −47
    Parent
    Imo sacrificing a bunch of OpenPhil AI safety funding in exchange for improving OpenPhil’s ability to influence politics seems like a pretty reasonable trade to me, at least depending on the actual numbers. As an extreme case, I would sacrifice all current OpenPhil AI safety funding in exchange for OpenPhil getting to pick which major party wins every US presidential election until the singularity.
    
    Concretely, the current presidential election seems extremely important to me from an AI safety perspective, I expect that importance to only go up in future elections, and I think OpenPhil is correct on what candidates are best from an AI safety perspective. Furthermore, I don’t think independent AI safety funding is that important anymore; models are smart enough now that most of the work to do in AI safety is directly working with them, most of that is happening at labs, and probably the most important other stuff to do is governance and policy work, which this strategy seems helpful for.
    
    I don’t know the actual marginal increase in political influence that they’re buying here, but my guess would be that the numbers pencil and OpenPhil is making the right call.
    
    I cannot think of anyone who I would credit with the creation or shaping of the field of AI Safety or Rationality who could still get OP funding.
    
    Separately, this is just obviously false. A lot of the old AI safety people just don’t need OpenPhil funding anymore because they’re working at labs or governments, e.g. me, Rohin Shah, Geoffrey Irving, Jan Leike, Paul (as you mention), etc.
    - ryan_greenblatt 2 Oct 2024 23:03 UTC
      74 points
      77
      Parent
      
      Furthermore, I don’t think independent AI safety funding is that important anymore; models are smart enough now that most of the work to do in AI safety is directly working with them, most of that is happening at labs,
      
      It might be the case that most of the quality weighted safety research involving working with large models is happening at labs, but I’m pretty skeptical that having this mostly happen at labs is the best approach and it seems like OpenPhil should be actively interested in building up a robust safety research ecosystem outside of labs.
      
      (Better model access seems substantially overrated in its importance and large fractions of research can and should happen with just prompting or on smaller models. Additionally, at the moment, open weight models are pretty close to the best models.)
      
      (This argument is also locally invalid at a more basic level. Just because this research seems to be mostly happening at large AI companies (which I’m also more skeptical of I think) doesn’t imply that this is the way it should be and funding should try to push people to do better stuff rather than merely reacting to the current allocation.)
      - evhub 2 Oct 2024 23:09 UTC
        7 points
        0
        Parent
        Yeah, I think that’s a pretty fair criticism, but afaict that is the main thing that OpenPhil is still funding in AI safety? E.g. all the RFPs that they’ve been doing, I think they funded Jacob Steinhardt, etc. Though I don’t know much here; I could be wrong.
        kave 3 Oct 2024 18:53 UTC
        10 points
        3
        Parent
        Wasn’t the relevant part of your argument like, “AI safety research outside of the labs is not that good, so that’s a contributing factor among many to it not being bad to lose the ability to do safety funding for governance work”? If so, I think that “most of OpenPhil’s actual safety funding has gone to building a robust safety research ecosystem outside of the labs” is not a good rejoinder to “isn’t there a large benefit to building a robust safety research ecosystem outside of the labs?”, because the rejoinder is focusing on relative allocations within “(technical) safety research”, and the complaint was about the allocation between “(technical) safety research” vs “other AI x-risk stuff”.
    - habryka 2 Oct 2024 22:21 UTC
      44 points
      18
      Parent
      Imo sacrificing a bunch of OpenPhil AI safety funding in exchange for improving OpenPhil’s ability to influence politics seems like a pretty reasonable trade to me, at least depending on the actual numbers. As an extreme case, I would sacrifice all current OpenPhil AI safety funding in exchange for OpenPhil getting to pick which major party wins every US presidential election until the singularity.
      Yeah, I currently think Open Phil’s policy activism has been harmful for the world, and will probably continue to be, so by my lights this is causing harm with the justification of causing even more harm. I agree they will probably get the bit right about what major political party would be better, but sadly the effects of policy work are much more nuanced and detailed than that, and also they will have extremely little influence on who wins the general elections.
      We could talk more about this sometime. I also have some docs with more of my thoughts here (which I maybe already shared with you, but would be happy to do so if not).
      Separately, this is just obviously false. A lot of the old AI safety people just don’t need OpenPhil funding anymore because they’re working at labs or governments, e.g. me, Rohin Shah, Geoffrey Irving, Paul (as you mention), etc.
      I genuinely don’t know whether Rohin would get funding to pursue what he thinks is most important, if he wanted it. I agree that some others don’t “need” funding anymore, though as I said, lab incentives are even worse on these dimensions and is of very little solace to me. I agree you might be able to get funding, though also see my other discussion with Eli on the boundaries I was trying to draw (which I agree are fuzzy and up-to-debate).
    - Ben Pace 2 Oct 2024 22:36 UTC
      19 points
      11
      Parent
      
      sacrificing a bunch of OpenPhil AI safety funding in exchange for improving OpenPhil’s ability to influence politics seems like a pretty reasonable trade
      
      Sacrificing half of it to avoid things associated with one of the two major political parties and being deceptive about doing this is of course not equal to half the cost of sacrificing all of such funding, it is a much more unprincipled and distorting and actively deceptive decision that messes up everyone’s maps of the world in a massive way and reduces our ability to trust each other or understand what is happening.
  - gw 2 Oct 2024 7:40 UTC
    9 points
    3
    Parent
    As a concrete example, as far as I can piece together from various things I have heard, Open Phil does not want to fund anything that is even slightly right of center in any policy work. I don’t think this is because of any COIs, it’s because Dustin is very active in the democratic party and doesn’t want to be affiliated with anything that is even slightly right-coded. Of course, this has huge effects by incentivizing polarization of AI policy work with billions of dollars, since any AI Open Phil funded policy organization that wants to engage with people on the right might just lose all of their funding because of that, and so you can be confident they will steer away from that.
    Thanks for sharing, I was curious if you could elaborate on this (e.g. if there are examples of AI policy work funded by OP that come to mind that are clearly left of center). I am not familiar with policy, but my one data point is the Horizon Fellowship, which is non-partisan and intentionally places congressional fellows in both Democratic and Republican offices. This straightforwardly seems to me like a case where they are trying to engage with people on the right, though maybe you mean not-right-of-center at the organizational level? In general though, (in my limited exposure) I don’t model any AI governance orgs as having a particular political affiliation (which might just be because I’m uninformed / ignorant).
    - habryka 2 Oct 2024 7:51 UTC
      17 points
      0
      Parent
      Yep, my model is that OP does fund things that are explicitly bipartisan (like, they are not currently filtering on being actively affiliated with the left). My sense is in-practice it’s a fine balance and if there was some high-profile thing where Horizon became more associated with the right (like maybe some alumni becomes prominent in the republican party and very publicly credits Horizon for that, or there is some scandal involving someone on the right who is a Horizon alumni), then I do think their OP funding would have a decent chance of being jeopardized, and the same is not true on the left.
      Another part of my model is that one of the key things about Horizon is that they are of a similar school of PR as OP themselves. They don’t make public statements. They try to look very professional. They are probably very happy to compromise on messaging and public comms with Open Phil and be responsive to almost any request that OP would have messaging wise. That makes up for a lot. I think if you had a more communicative and outspoken organization with a similar mission to Horizon, I think the funding situation would be a bunch dicier (though my guess is if they were competent, an organization like that could still get funding).
      More broadly, I am not saying “OP staff want to only support organizations on the left”. My sense is that many individual OP staff would love to fund more organizations on the right, and would hate for polarization to occur, but that organizationally and because of constraints by Dustin, they can’t, and so you will see them fund organizations that aim for more engagement with the right, but there will be relatively hard lines and constraints that will mostly prevent that.
  - MichaelDickens 4 Oct 2024 22:53 UTC
    6 points
    5
    Parent
    Thanks for the reply. When I wrote “Many people would have more useful things to say about this than I do”, you were one of the people I was thinking of.
    
    AI Impacts wants to think about AI sentience and OP cannot fund orgs that do that kind of work
    
    Related to this, I think GW/OP has always been too unwilling to fund weird causes, but it’s generally gotten better over time: originally recommending US charities over global poverty b/c global poverty was too weird, taking years to remove their recommendations for US charities that were ~100x less effective than their global poverty recs, then taking years to start funding animal welfare and x-risk, then still not funding weirder stuff like wild animal welfare and AI sentience. I’ve criticized them for this in the past but I liked that they were moving in the right direction. Now I get the sense that recently they’ve gotten worse on AI safety (and weird causes in general).
  - Eli Tyre 2 Oct 2024 17:34 UTC
    5 points
    1
    Parent
    I cannot think of anyone who I would credit with the creation or shaping of the field of AI Safety or Rationality who could still get OP funding.
    Nitpick, but this statement seems obviously false given what I understand your views to be? Paul, Carl, Buck, for starters.
    
    [edit: I now see that Oliver had already made a footnote to that effect.]
    - habryka 2 Oct 2024 17:43 UTC
      6 points
      −1
      Parent
      (I like Buck, but he is one generation later than the one I was referencing. Also, I am currently like ⁵⁰⁄₅₀ whether Buck would indeed be blacklisted. I agree that Carl is a decent counterexample, though he is a bit of a weirder case)
      - Buck 3 Oct 2024 1:34 UTC
        8 points
        0
        Parent
        I agree that I didn’t really have much of an effect on this community’s thinking about AIS until like 2021.
      - Eli Tyre 2 Oct 2024 17:53 UTC
        4 points
        0
        Parent
        Jessica Taylor seems like she’s also second generation?
        habryka 2 Oct 2024 17:57 UTC
        4 points
        0
        Parent
        I remember running into her a bunch before I ran into Buck. Scott/Abram are also second generation. Overall, seems reasonable to include Buck (but communicating my more complicated epistemic state with regard to him would have been harder).
  - yc 4 Oct 2024 7:27 UTC
    4 points
    0
    Parent
    Out of curiosity—“it’s because Dustin is very active in the democratic party and doesn’t want to be affiliated with anything that is right-coded” Are these projects related to AI safety or just generally? And what are some examples?
    - habryka 4 Oct 2024 16:27 UTC
      2 points
      0
      Parent
      I am not sure I am understanding your question. Are you asking about examples of left-leaning projects that Dustin is involved in, or right-leaning projects that cannot get funding? On the left, Dustin is one of the biggest donors to the democratic party (with Asana donating $45M and him donating $24M to Joe Biden in 2020).
      - yc 4 Oct 2024 17:10 UTC
        2 points
        1
        Parent
        Examples of right leaning projects that got rejected by him due to his political affiliation, and if these examples are AI safety related
        habryka 4 Oct 2024 17:17 UTC
        2 points
        0
        Parent
        I don’t currently know of any public examples and feel weird publicly disclosing details about organizations that I privately heard about. If more people are interested I can try to dig up some more concrete details (but can’t make any promises on things I’ll end up able sharing).
        yc 4 Oct 2024 19:16 UTC
        1 point
        0
        Parent
        No worries; thanks!
  - ROM 2 Nov 2024 7:36 UTC
    3 points
    0
    Parent
    FHI could not get OP funding
    Can you elaborate on what you mean by this?
    OP appears to have been one of FHI’s biggest funders according to Sandberg:^[1]
    Eventually, Open Philanthropy became FHI’s most important funder, making two major grants: £1.6m in 2017, and £13.3m in 2018. Indeed, the donation behind this second grant was at the time the largest in the Faculty of Philosophy’s history (although, owing to limited faculty administrative capacity for hiring and the subsequent hiring freezes it imposed, a large part of this grant would remain unspent). With generous and unrestricted funding from a foundation that was aligned with FHI’s mission, we were free to expand our research in ways we thought would make the most difference.
    The hiring (and fundraising) freeze imposed by Oxf began in 2020.
    ^
    See page 15
    - habryka 2 Nov 2024 7:40 UTC
      3 points
      0
      Parent
      In 2023/2024 OP drastically changed it’s funding process and priorities (in part in response to FTX, in part in response to Dustin’s preferences). This whole conversation is about the shift in OPs giving in this recent time period.
      See also: https://forum.effectivealtruism.org/posts/foQPogaBeNKdocYvF/linkpost-an-update-from-good-ventures
      - ROM 2 Nov 2024 7:58 UTC
        −1 points
        −2
        Parent
        I agree with the claim you’re making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.
        I don’t mean to nitpick, but it still feels misleading to claim “FHI could not get OP funding” when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn’t true.
        habryka 2 Nov 2024 17:03 UTC
        2 points
        0
        Parent
        The “could” here is (in context) about “could not get funding from modern OP”. The whole point of my comment was about the changes that OP underwent. Sorry if that wasn’t as clear, it might not be as obvious to others that of course OP was very different in the past.
        ROM 3 Nov 2024 0:26 UTC
        2 points
        0
        Parent
        I understand the claim you were making now and I hope the nitpicking isn’t irritable.
  - Chris Lakin 22 Feb 2025 21:18 UTC
    2 points
    0
    Parent
    ~~fwiw, FABRIC was able to get funding in~~ ~~November 2024~~ ~~(who knows if this date is correct though)~~
    nvm this was an “exit grant” lmao
  - MichaelDickens 5 Oct 2024 1:56 UTC
    2 points
    1
    Parent
    If Open Phil is unwilling to fund some/most of the best orgs, that makes earning to give look more compelling.
    
    (There are some other big funders in AI safety like Jaan Tallinn, but I think all of them combined still have <10% as much money as Open Phil.)
- Wei Dai 2 Oct 2024 16:29 UTC
  41 points
  28
  Parent
  And I agree with Bryan Caplan’s recent take that friendships are often a bigger conflict of interest than money, so Open Phil higher-ups being friends with Anthropic higher-ups is troubling.
  No kidding. From https://www.openphilanthropy.org/grants/openai-general-support/:
  OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.
  Wish OpenPhil and EAs in general were more willing to reflect/talk publicly about their mistakes. Kind of understandable given human nature, but still… (I wonder if there are any mistakes I’ve made that I should reflect more on.)
- David Hornbein 2 Oct 2024 22:48 UTC
  33 points
  42
  Parent
  “Open Phil higher-ups being friends with Anthropic higher-ups” is an understatement. An Open Philanthropy cofounder (Holden Karnofsky) is married to an Anthropic cofounder (Daniela Amodei). It’s a big deal!
- Raemon 2 Oct 2024 5:07 UTC
  32 points
  16
  Parent
  I want to add the gear of “even if it actually turns out that OpenPhil was making the right judgment calls the whole time in hindsight, the fact that it’s hard from the outside to know that has some kind of weird Epistemic Murkiness effects that are confusing to navigate, at the very least kinda suck, and maybe are Quite Bad.”
  I’ve been trying to articulate the costs of this sort of thing lately and having trouble putting it into words, and maybe it’ll turn out this problem was less of a big deal than it currently feels like to me. But, something like the combo of
  a) the default being for many people to trust OpenPhil
  b) many people who are paying attention think that they should at least be uncertain about it, and somewhere on a “slightly wary” to “paranoid” scale. and...
  c) this at least causes a lot of wasted cognitive cycles
  d) it’s… hard to figure out how big a deal to make of it. A few people (i.e. habryka or previously Benquo or Jessicata) make it their thing to bring up concerns frequently. Some of those concerns are, indeed, overly paranoid, but, like, it wasn’t actually reasonable to calibrate the wariness/conflict-theory-detector to zero, you have to make guesses. This is often exhausting and demoralizing for the people doing it. People typically only select into this sort of role if they’re a bit more prone to conflict about it, which means a lot of the work is kinda thankless because people are pushing back on you for being too conflicty. Something about this compounds over time.
  e) the part that feels hardest to articulate and maybe is fake is that, there’s something of a “group epistemic process” going on in the surrounding community, and everyone either not tracking this sort of thing, or tracking it but not sure how to take it or what to do about it… I’m not sure how to describe it better than “I dunno something about the group orienting process subtly epistemically fucked” and/or “people just actually take sanity-damage from it.”
  
  (“subtly epistemically fucked” might operationalize as “it takes an extra 1-3 years for things to become consensus knowledge/beliefs than it’d otherwise take”)
  Anyway, thanks for bringing it up.
  What links here?
  - Noosphere89's comment on Anthropic AI made the right call by bhauth (5 Oct 2024 18:24 UTC; 0 points)
  - habryka 2 Oct 2024 5:29 UTC
    10 points
    −1
    Parent
    Some of those concerns are, indeed, overly paranoid
    I am actually curious if you have any overly paranoid predictions from me. I was today lamenting that despite feeling paranoid on this stuff all the time, I de-facto have still been quite overly optimistic in almost all of my predictions on this topic (like, I only gave SPARC a 50% chance of being defunded a few months ago, which I think was dumb, and I was not pessimistic enough to predict the banning of all right-associated project, and not pessimistic enough to predict a bunch of other grant decisions that I feel weird talking publicly about).
    - Raemon 2 Oct 2024 6:00 UTC
      6 points
      4
      Parent
      The predictions that seemed (somewhat) overly paranoid of yours were more about Anthropic than OpenPhil, and the dynamic seemed similar and I didn’t check that hard while writing the comment. (maybe some predictions about how/why the OpenAI board drama went down, which was at the intersection of all three orgs, which I don’t think have been explicitly revealed to have been “too paranoid” but I’d still probably take bets against)
      (I think I agree that overall you were more like “not paranoid enough” than “too paranoid”, although I’m not very confident)
      - habryka 2 Oct 2024 6:14 UTC
        11 points
        3
        Parent
        My sense is my predictions about Anthropic have also not been pessimistic enough, though we have not yet seen most of the evidence. Maybe a good time to make bets.
        Raemon 2 Oct 2024 17:45 UTC
        9 points
        5
        Parent
        I kinda don’t want to litigate it right now, but, I was thinking “I can think of one particular Anthropic prediction Habryka made that seemed false and overly pessimistic to me”, which doesn’t mean I think you’re overall uncalibrated about Anthropic, and/or not pessimistic enough.
        And (I think Habryka got this but for benefit of others), a major point of my original comment was not just “you might be overly paranoid/pessimistic in some cases”, but, ambiguity about how paranoid/pessimistic is appropriate to be results in some kind of confusing, miasmic social-epistemic process (where like maybe you are exactly calibrated on how pessimistic to be, but it comes across as too aggro to other people, who pushback). This can be bad whether you’re somewhat-too-pessimistic, somewhat-too-optimistic, or exactly calibrated.
      - Ben Pace 3 Oct 2024 7:52 UTC
        6 points
        0
        Parent
        My recollection is that Habryka seriously considered hypotheses that involved worse and more coordinated behavior than reality, but that this is different from “this was his primary hypothesis that he gave the most probability mass to”. And then he did some empiricism and falsified the hypotheses and I’m glad those hypotheses were considered and investigated.
        Here’s an example of him giving 20-25% to a hypothesis about conspiratorial behavior that I believe has turned out to be false.
        habryka 3 Oct 2024 16:25 UTC
        2 points
        0
        Parent
        Yep, that hypothesis seems mostly wrong, though I more feel like I received 1-2 bits of evidence against it. If the board had stabilized with Sam being fired, even given all I know, I would have still thought a merger with Anthropic to be like ~5%-10% likely.
  - MichaelDickens 4 Oct 2024 23:09 UTC
    4 points
    3
    Parent
    
    A few people (i.e. habryka or previously Benquo or Jessicata) make it their thing to bring up concerns frequently.
    
    My impression is that those people are paying a social cost for how willing they are to bring up perceived concerns, and I have a lot of respect for them because of that.
    - Noosphere89 5 Oct 2024 19:58 UTC
      2 points
      0
      Parent
      As someone who has disagreed quite a bit with Habryka in the past, endorsed.
      
      They are absolutely trying to solve a frankly pretty difficult problem, where there’s a lot of selection for more conflict than is optimal, and also selection for being more paranoid than is optimal, because they have to figure out if a company or person in the AI space is being shady or outright a liar, which unfortunately has a reasonable probability, but there’s also a reasonable probability of them being honest but them failing to communicate well.
      
      I agree with Raemon that you can’t have your conflict theory detectors set to 0 in the AI space.
      
      Some of those concerns are, indeed, overly paranoid, but, like, it wasn’t actually reasonable to calibrate the wariness/conflict-theory-detector to zero, you have to make guesses.
  - Czynski 23 Feb 2025 1:02 UTC
    1 point
    0
    Parent
    
    People typically only select into this sort of role if they’re a bit more prone to conflict about it, which means a lot of the work is kinda thankless because people are pushing back on you for being too conflicty.
    
    Things can be done to encourage this behavior anway, such as with how the site works. Instead the opposite has been done; this is the root of my many heated disagreements with the LW team.
- metachirality 2 Oct 2024 6:40 UTC
  31 points
  21
  Parent
  Maybe make a post on the EA forum?
- MichaelDickens 4 Oct 2024 22:48 UTC
  2 points
  1
  Parent
  I’ve been avoiding LW for the last 3 days because I was anxious that people were gonna be mad at me for this post. I thought there was a pretty good chance I was wrong, and I don’t like accusing people/orgs of bad behavior. But I thought I should post it anyway because I believed there was some chance lots of people agreed with me but were too afraid of social repercussions to bring it up (like I almost was).
  - MichaelDickens 5 Oct 2024 0:48 UTC
    1 point
    0
    Parent
    I should add that I don’t want to dissuade people from criticizing me if I’m wrong. I don’t always handle criticism well, but it’s worth the cost to have accurate beliefs about important subjects. I knew I was gonna be anxious about this post but I accepted the cost because I thought there was a ~25% chance that it would be valuable to post.
MichaelDickens 23 Apr 2025 18:11 UTC
120 points
36
I find it hard to trust that AI safety people really care about AI safety.
- DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations:
  - OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn’t; too many more to list.
  - Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that’s not related to x-risk but it’s related to trustworthiness).
- For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but:
  - Epoch is not about reducing x-risk, and they were explicit about this but I didn’t learn it until this week
  - its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad)
  - some of their researchers left to start another build-AGI startup (I’m not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities)
  - Director Jaime Sevilla believes “violent AI takeover” is not a serious concern, and “I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development”, and “on net I support faster development of AI, so we can benefit earlier from it” which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born)
- I feel bad picking on Epoch/Jaime because they are being unusually forthcoming about their motivations, in a way that exposes them to criticism. This is noble of them; I expect most orgs not to be this noble.
  - When some other org does something that looks an awful lot like it’s accelerating capabilities, and they make some argument about how it’s good for safety, I can’t help but wonder if they secretly believe the same things as Epoch and are not being forthright about their motivations
  - My rough guess is for every transparent org like Epoch, there are 3+ orgs that are pretending to care about x-risk but actually don’t
- Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
- Many AI safety people currently or formerly worked at AI companies. They stand to make money from accelerating AI capabilities. The same is true of grantmakers
  - I briefly looked thru some grantmakers and I see financial COIs at Open Philanthropy, Survival and Flourishing Fund, and Manifund; but none at Long-Term Future Fund
- A different sort of conflict of interest: many AI safety researchers have an ML background and enjoy doing ML. Unsurprisingly, they often arrive at the belief that doing ML research is the best way to make AI safe. This ML research often involves making AIs better at stuff. Pausing AI development (or imposing significant restrictions) would mean they don’t get to do ML research anymore. If they oppose a pause/slowdown, is that for ethical reasons, or is it because it would interfere with their careers?
- At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation. I am appalled at the level of selfishness required to seek immortality at the cost of risking all of humanity. And I’m sure most people who hold this position know it’s appalling, so they keep it secret and publicly give rationalizations for why accelerating AI is actually the right thing to do. In a way, I admire people who are open about their selfishness, and I expect they are the minority.
But you should know that you might not be able to trust me either.
- I have some moral uncertainty but my best guess is that future people are just as valuable as present-day people. You might think this leads me to put too much priority on reducing x-risk relative to helping currently-alive people. You might think it’s unethical that I’m willing to delay AGI (potentially hurting currently-alive people) to reduce x-risk.
- I care a lot about non-human animals, and I believe it’s possible in principle to trade off human welfare against animal welfare. (Although if anything, I think that should make me care less about x-risk, not more.)
- ETA: I am pretty pessimistic about AI companies’ plans for aligning ASI. My weakly held belief is that if companies follow their current plans, there’s a 2 in 3 chance of a catastrophic outcome. (My unconditional P(doom) is lower than that.) You might believe this makes me too pessimistic about certain kinds of strategies.
(edit: removed an inaccurate statement)

[1] ETA: I saw several examples of this on Twitter. Went back and looked and I couldn’t find the examples I recall seeing. IIRC they were mainly quote-tweets, not direct replies, and I don’t know how to find quote-tweets (the search function was unhelpful).
What links here?
- testingthewaters 24 Apr 2025 1:47 UTC
  53 points
  13
  Parent
  I think this is straightforwardly true and basically hard to dispute in any meaningful way. A lot of this is basically downstream of AI research being part of a massive market/profit generating endeavour (the broader tech industry), which straightforwardly optimises for more and more “capabilities” (of various kinds) in the name of revenue. Indeed, one could argue that long before the current wave of LLMs the tech industry was developing powerful agentic systems that actively worked to subvert human preferences in favour of disempowering them/manipulating them, all in the name of extracting revenue from intelligent work… we just called the AI system the Google/Facebook/Youtube/Twitter Algorithm.
  
  The trend was always clear: an idealistic mission to make good use of global telecommunication/information networks finds initial success and is a good service. Eventually pressures to make profits cause the core service to be degraded in favour of revenue generation (usually ads). Eventually the company accrues enough shaping power to actively reshape the information network in its favour, and begins dragging everything down with it. In the face of this AI/LLMs are just another product to be used as a revenue engine in the digital economy.
  
  AI safety, by its nature, resists the idea of creating powerful new information technologies to exploit mercilessly for revenue without care for downstream consequences. However, many actors in the AI safety movement are themselves tied to the digital economy, and depend on it for their power, status, and livelihoods. Thus, it is not that there are no genuine concerns being expressed, but that at every turn these concerns must be resolved in a way that keeps the massive tech machine going. Those who don’t agree with this approach are efficiently selected against. For example:
  - Race dynamics are bad? ~~Maybe we should slow down.~~ We just need to join the race and be the more morally-minded actor. After all, there’s no stopping the race, we’re already locked in.
  - Our competitors/other parties are doing dangerous things? ~~Maybe we could coordinate and share our concerns and research with them.~~ We can’t fall behind, we’ve got to fly the AI safety flag at the conference of AI Superpowers. Let’s speed up too.
  - New capabilities are unknown and jagged? ~~Let’s just leave well enough alone.~~ Let’s invest more in R&D so we can safely understand and harness them.
  - Here’s a new paradigm that might lead to a lot of risk and a lot of reward. ~~We should practice the virtue of silence and buy the world time~~ We should make lots of noise so we can get funding. To study it. Not to use it, of course. Just to understand the safety implications.
  - Maybe progress in AI is slower than we thought. ~~Hooray! Maybe we can chill for a bit~~ That’s time for us to exploit our superior AI knowledge and accelerate progress to our benefit.
  We’ve seen this before.
  
  To be honest, though, I’m not sure what to do about this. So much has been invested by now that it truly feels like history is moving with a will of its own, rather than individuals steering the ship. Every time I look at what’s going on I feel the sense that maybe I’m just the idiot that hasn’t gotten the signal to hammer that “exploit” button. After all, it’s what everyone else is doing.
  - Neel Nanda 24 Apr 2025 16:31 UTC
    17 points
    −4
    Parent
    
    Our competitors/other parties are doing dangerous things? Maybe we could coordinate and share our concerns and research with them
    
    What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
    
    I think many of these are predicated on the belief that it would be plausible to get everyone to pause now. In my opinion this is extremely hard and pretty unlikely to happen. I think that, even in worlds where actors continue to race, there are actions we can take to lower the probability of x-risk, and it is a reasonable position to do so.
    
    I separately think that many of the actions you describe historically were dumb/harmful, but are equally consistent with “25% of safety people act like this” and 100%
    - MichaelDickens 25 Apr 2025 0:18 UTC
      10 points
      10
      Parent
      
      What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
      
      Not GP but I’d guess maybe 10%. Seems worth it to try. IMO what they should do is hire a team of top negotiators to work full-time on making deals with other AI companies to coordinate and slow down the race.
      
      ETA: What I’m really trying to say is I’m concerned Anthropic (or some other company) would put in a half-assed effort to cooperate and then give up, when what they should do is Try Harder. “Hire a team to work on it full time” is one idea for what Trying Harder might look like.
      - Neel Nanda 25 Apr 2025 8:45 UTC
        3 points
        1
        Parent
        Fair. My probability is more like 1-2%. I do think that having a team of professional negotiators seems a reasonable suggestion though. I predict the Anthropic position would be that this is really hard to achieve in general, but that if slowing down was ever achieved we would need much stronger evidence of safety issues. In addition to all the commercial pressure, slowing down now could be considered to violate antitrust law. And it seems way harder to get all the other actors like Meta or DeepSeek or xAI on board, meaning I don’t even know if I think it’s good for some of the leading actors to unilaterally slow things down now (I predict mildly net good, but with massive uncertainty and downsides)
- Neel Nanda 24 Apr 2025 1:51 UTC
  35 points
  −1
  Parent
  I think it’s important to distinguish between factual disagreements and moral disagreements. My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc. I’m much less sure the founders of Mechanize care.
  
  And to whatever degree you trust my judgement/honesty, I work at DeepMind and reducing existential risk is a fairly large part of my motivation (though far from all of it), and I try to regularly think about how my team’s strategy can be better targeted towards this.
  
  And I know a lot of safety people at deepmind and other AGI labs who I’m very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team’s current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is actually mattering for X-Risk, some dedicate a bunch of time to forming detailed mental models of how AI leads to bad outcomes and how this could be prevented and how their work fit in, etc. If people just wanted to do fun ml work, there’s a lot of other places Obviously people are complex. People are largely not motivated by a single thing and the various conflicts of interest you note seem real. I expect some of the people I’m thinking of. I’ve misjudged or say they care about X-Risk but actually don’t. But I would just be completely shocked if say half of them were not highly motivated by reducing x-. It’s generally more reasonable to be skeptical of the motivations of senior leadership, who have much messier incentives and constraints on their communication.
  
  Regarding the missing mood thing, I think there’s something to what you’re saying, but also that it’s really psychologically unhealthy to work in a field which is constantly advancing like AI and every time there is an advance feel the true emotional magnitudes of what it would mean if existential risk has now slightly increased. If anyone did, I think they would burn out pretty fast, so the people left in the field largely don’t. I also think you should reserve those emotions for times when a trend is deviated from rather than when a trend is continued. In my opinion, the reason people were excited about the metr work was that it was measuring a thing that was already happening much more precisely, it was a really important question and reducing our confusion about that is high value. It wasn’t really capabilities work in my opinion
  
  unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born
  
  Is this implicitly assuming total utilitarianism? I certainly care about the future of humanity, but I reject moral views that say it is overwhelmingly the thing that matters and present day concerns round down to zero. I think many people have intuitions aligning with this.
  - habryka 24 Apr 2025 2:21 UTC
    16 points
    3
    Parent
    My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely
    I don’t think this is true. My sense is he views his current work as largely being good on non x-risk grounds, and thinks that even if it might slightly increase x-risk, he wouldn’t think it would be worth it for him to stop working on it, since he thinks it’s unfair to force the current generation to accept a slightly higher risk of not achieving longevity escape velocity and more material wealth in exchange for a small increase in existential risk.
    He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
    I selfishly care about me, my friends and family benefitting from AI. For some of my older relatives, it might make a big difference to their health and wellbeing whether AI-fueled explosive growth happens in 10 vs 20 years.
    [...]
    I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term. And I don’t think it’s fair of anyone to ask me to do that. Not that it should be my place to unilaterally make such a decision anyway.
    It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
    It is true that Jaime might think that AI x-risk could hypothetically be motivating to him, but at least my best interpretations of what is going on, suggest to me he de-facto does not consider it as an important input into his current strategic choices, or the choices of Epoch.
    - Neel Nanda 24 Apr 2025 16:25 UTC
      15 points
      5
      Parent
      I think you’re strawmanning him somewhat
      
      It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
      
      Does not seem a fair description of
      
      I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term
      
      People are allowed to have multiple values! If someone would trade a small amount of value A for a large amount of value B, this is entirely consistent with them thinking both are important.
      
      Like, if you offer people the option to commit suicide in exchange for reducing x-risk by x%, what value of x do you think they would require? And would you say they are not x risk motivated if they eg aren’t willing to do it at 1e-6?
      
      In practice this doesn’t really come up, so it’s not that relevant. Similarly for Jaime’s position, how much he believes himself to be in situations where he’s trading off meaningful harm to today and meaningful harm to the present generation seems very important.
    - ozziegooen 25 Apr 2025 15:08 UTC
      2 points
      0
      Parent
      I did a bit of digging, because these quotes seemed narrow to me. Here’s the original tweet of that tweet thread.
      Full state dump of my AI risk related beliefs:
      - I currently think that we will see ~full automation of society by Median 2045, with already very significant benefits by 2030
      - I am not very concerned about violent AI takeover. I am concerned about concentration of power and gradual disempowerment. I put the probability that ai ends up being net bad for humans at 15%.
      - I support treating ai as a general purpose tech and distributed development. I oppose stuff like export controls and treating AI like military tech. My sense is that AI goes better in worlds where we gradually adopt it and it’s seen as a beneficial general purpose tech, rather than a key strategic tech only controlled by a small group of people—
      I think alignment is unlikely to happen in a robust way, though companies could have a lot of sway on AI culture in the short term.
      - on net I support faster development of AI, so we can benefit earlier from it.
      It’s a hard problem, and I respect people trying their hardest to make it go well.
      Then right after:
      
      All said, this specific chain doesn’t give us a huge amount of information. It totals something like 10-20 sentences.
      
      > He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
      
      This seems like a major oversimplification to me. He says “I am concerned about concentration of power and gradual disempowerment. I put the probability that ai ends up being net bad for humans at 15%.” There is a cluster in the rationalist/EA community that believes that “gradual disempowerment” is an x-risk. Perhaps you wouldn’t define “concentration of power and gradual disempowerment” as technically an x-risk, but if so, that seems a bit like a technicality to me. It can clearly be a very major deal.
      
      It sounds a lot to me that Jaime is very concerned about some aspects of AI risk but not others.
      
      In the quote you reference, he clearly says, “Not that it should be my place to unilaterally make such a decision anyway.”. I hear him saying, “I disagree with the x-risk community about the issue of slowing down AI, specifically. However, I don’t think this disagreement a big concern, given that I also feel like it’s not right for me to personally push for AI to be sped up, and thus I won’t do it.”
      - habryka 25 Apr 2025 18:49 UTC
        2 points
        −2
        Parent
        I am not saying Jaime in-principle could not be motivated by existential risk from AI, but I do think the evidence suggests to me strongly that concerns about existential risk from AI are not among the primary motivations for his work on Epoch (which is what I understood Neel to be saying).
        Maybe it is because he sees the risk as irreducible, maybe it is because the only ways of improving things would cause collateral damage for other things he cares about. I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.
        ryan_greenblatt 25 Apr 2025 19:16 UTC
        9 points
        9
        Parent
        My sense is that Jaime’s view (and Epoch’s view more generally) is more like: “making people better informed about AI in a way that is useful to them seems heuristically good (given that AI is a big deal), it doesn’t seem that useful or important to have a very specific theory of change beyond this”. From this perspective, saying “concerns about existential risk from AI are not among the primary motivations” is partially slightly confused as the heuristic isn’t necessarily back chained from any more specific justification. Like there is no specific terminal motivation.
        
        Like consider someone who donates to Give Directly due to “idk, seems heuristically good to empower the worst off people” and someone who generally funds global health and well being due to specifically caring about ongoing human welfare (putting aside AI for now). This heuristic is partially motived via flow through from caring about something like welfare even though it doesn’t directly show up. These people seem like natural allies to me except in surprising circumstances (e.g., it turns out the worst off people use marginal money/power in a way that is net negative for human welfare).
        habryka 25 Apr 2025 20:26 UTC
        4 points
        −3
        Parent
        I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:
        My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc.
        “Not 100% motivated by it” IMO sounds like an implication that “being motivated by reducing x-risk would make up something like 30%-70% of the motivation”. I don’t think that’s true, and I think various things that Jaime has said make that relatively clear.
        Neel Nanda 26 Apr 2025 7:18 UTC
        10 points
        6
        Parent
        I think you’re conflating “does not think that slowing down AI obviously reduces x-risk” with “reducing x risk is not a meaningful motivation for his work”. Jaime has clearly said that he believes x risk is a real and >=15% (though via different mechanisms to loss of control). I think that the public being well informed about AI generally reduces risk, and I think that Epoch is doing good work on this front, and that increasing the probability that AI goes well is part of why Jaime works on this. I think it’s much less clear if Frontier Math was good, but Jaime wasn’t very involved anyway, so doesn’t seem super relevant.
        
        I basically think the only thing he’s said that you could consider objectionable is that he’s reluctant to push for a substantial pause for AI since x risk is not the only thing he cares about. But he also (sincerely, imo) expresses uncertainty about whether such a pause WOULD be good for x risk
        ozziegooen 25 Apr 2025 19:29 UTC
        2 points
        2
        Parent
        There are a few questions here.
        
        1. Do Jaime’s writings that that he cares about x-risk or not?
        → I think he fairly clearly states that cares.
        
        2. Does all the evidence, when put together, imply that actually, Jaime doesn’t care about x-risk?
        → This is a much more speculative question. We have to assess how honest he is in his writing. I’d bet money that Jaime at least believes that he cares and is taking corresponding actions. This of course doesn’t absolve him of full responsibility—there are many people who believe they do things for good reasons, but causally actually do things for selfish reasons. But now we’re getting to a particularly speculative area.
        
        “I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.” → Again, to me, I regard him as basically claiming that he does care. I’d bet money that if we ask him to clarify, he’d claim that he cares. (Happy to bet on this, if that would help)
        
        At the same time, I doubt that this is your actual crux. I’d expect that even if he claimed (more precisely) to care, you’d still be skeptical of some aspect of this.
        
        ---
        
        Personally, I have both positive and skeptical feelings about Epoch, as I do other evals orgs. I think they’re doing some good work, but I really wish they’d lean a lot more on [clearly useful for x-risk] work. If I had a lot of money to donate, I could picture donating some to Epoch, but only if I could get a lot of assurances on which projects it would go to.
        
        But while I have reservations about the org, I think some of the specific attacks against them (and defenses or them) are not accurate.
  - Mateusz Bagiński 24 Apr 2025 4:36 UTC
    12 points
    10
    Parent
    People’s “deep down motivations” and “endorsed upon reflection values,” etc, are not the only determiners of what they end up doing in practice re influencing x-risk.
    - Neel Nanda 24 Apr 2025 16:18 UTC
      2 points
      0
      Parent
      I agree with that. I was responding specifically to this:
      
      I find it hard to trust that AI safety people really care about AI safety.
      - Garrett Baker 24 Apr 2025 22:50 UTC
        3 points
        2
        Parent
        In that case I think your response is a non sequitur, since clearly “really care” in this context means “determiners of what they end up doing in practice re influencing x-risk”.
        Neel Nanda 25 Apr 2025 8:40 UTC
        4 points
        2
        Parent
        I personally define “really care” as “the thing they actually care about and meaningfully drives their actions (potentially among other things) is X”. If you want to define it as eg “the actions they take, in practice, effectively select for X, even if that’s not their intent” then I agree my post does not refute the point, and we have more of a semantic disagreement over what the phrase means.
        
        I interpret the post as saying “there are several examples of people in the AI safety community taking actions that made things worse. THEREFORE these people are actively malicious or otherwise insincere about their claims to care about safety and it’s largely an afterthought put to the side as other considerations dominate”. I personally agree with some examples, disagree with others, but think this is explained by a mix of strategic disagreements about how to optimise for safety, and SOME fraction of the alleged community really not caring about safety
        
        People are often incompetent at achieving their intended outcome, so pointing towards failure to achieve an outcome does not mean this was what they intended. ESPECIALLY if there’s no ground truth and you have strategic disagreements with those people, so you think they failed and they think they succeeded
        MichaelDickens 25 Apr 2025 16:03 UTC
        5 points
        2
        Parent
        I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
        
        claims to care about x-risk, but is being insincere
        genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
        genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
        
        I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
        
        On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
        Neel Nanda 26 Apr 2025 7:20 UTC
        2 points
        0
        Parent
        Does 3 include “cares about x risk and other things, does a good job of evaluating the trade off of each action according to their values, but is sometimes willing to do things that are great according to their other values but slightly negative results x risk”?
        yams 28 Apr 2025 17:42 UTC
        1 point
        0
        Parent
        This looks closer to 2 to me?
        Also, from the outside, can you describe how an observer would distinguish between [any of the items on the list] and the situation you lay out in your comment / what the downsides are to treating them similarly? I think Michael’s point is that it’s not useful/worth it to distinguish.
        Whether someone is dishonest, incompetent, or underweighting x-risk (by my lights) mostly doesn’t matter for how I interface with them, or how I think the field ought to regard them, since I don’t think we should brow beat people or treat them punitively. Bottom line is I’ll rely (as an unvalenced substitute for ‘trust’) on them a little less.
        I think you’re right to point out the valence of the initial wording, fwiw. I just think taxonomizing apparent defection isn’t necessary if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
        Neel Nanda 28 Apr 2025 19:00 UTC
        2 points
        0
        Parent
        
        if we take as a given that we ought to treat people well and avoid claiming special knowledge of their internals, while maintaining the integrity of our personal and professional circles of trust.
        
        If we take this as a given, I’m happy for people to categorise others however they’d like! I haven’t noticed people other than you taking that perspective in this thread
        yams 29 Apr 2025 0:31 UTC
        1 point
        0
        Parent
        Oh man — I sure hope making ‘defectors’ and lab safety staff walk the metaphorical plank isn’t on the table. Then we’re really in trouble.
        Expand this thread
        Neel Nanda 29 Apr 2025 8:37 UTC
        4 points
        1
        Parent
        My read is that in practice many people in the online LW community are fairly hostile, and many people in the labs think the community doesn’t know what they’re talking about and totally ignores them/doesn’t really care if they’re made to walk the metaphorical plank.
  - testingthewaters 24 Apr 2025 2:04 UTC
    7 points
    9
    Parent
    At the risk of seeming quite combative, when you say
    
    And I know a lot of safety people at deepmind and other AGI labs who I’m very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team’s current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is actually mattering for X-Risk, some dedicate a bunch of time to forming detailed mental models of how AI leads to bad outcomes and how this could be prevented and how their work fit in, etc.
    
    That’s basically what I mean when I said in my comment
    
    AI safety, by its nature, resists the idea of creating powerful new information technologies to exploit mercilessly for revenue without care for downstream consequences. However, many actors in the AI safety movement are themselves tied to the digital economy, and depend on it for their power, status, and livelihoods. Thus, it is not that there are no genuine concerns being expressed, but that at every turn these concerns must be resolved in a way that keeps the massive tech machine going. Those who don’t agree with this approach are efficiently selected against. [examples follow]
    
    And, after thinking about it, I don’t see your statement conflicting with mine.
- Lucius Bushnaq 24 Apr 2025 6:26 UTC
  28 points
  6
  Parent
  At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation.
  If this is you: Please just sign up for cryonics. It’s a much better immortality gambit than rushing for ASI.
  - J Bostock 24 Apr 2025 10:04 UTC
    7 points
    3
    Parent
    This seems not to be true assuming a P(doom) of 25% and a purely selfish perspective, or even a moderately altruistic perspective which places most of its weight on, say, the person’s immediate family and friends.
    Of course any cryonics-free strategy is probably dominated by that same strategy plus cryonics for a personal bet at immortality, but when it comes to friends and family it’s not easy to convince people to sign up for cryonics! But immortality-maxxing for one’s friends and family almost definitely entails accelerating AI even at pretty high P(doom)
    (And that’s without saying that this is very likely to not be the true reason for these people’s actions. It’s far more likely to be local-perceived-status-gradient-climbing followed by a post-hoc rationalization (which can also be understood as a form of local-perceived-status-gradient-climbing) and signing up for cryonics doesn’t really get you any status outside of the deepest depths of the rat-sphere, which people like this are obviously not in since they’re gaining status from accelerating AI)
- Tao Lin 24 Apr 2025 16:55 UTC
  19 points
  11
  Parent
  Note that any competent capital holder has significant conflict of interest with AI, AI is already a significant fraction of the stock market and a pause would bring down most capital, not just private lab equity
- FVelde 24 Apr 2025 6:17 UTC
  18 points
  1
  Parent
  The more sacrifices someone has made, the easier it is to believe that they mean what they say.
  Kokotajlo gave up millions to say what he wants, so I trust he is earnest. People who have gotten arrested at Stop AI have spent time in jail for their beliefs, so I trust they are earnest.
  It doesn’t mean these people are most useful for AI safety but on the subject of trust I know no better measurement than sacrifice.
- Cole Wyeth 23 Apr 2025 19:03 UTC
  15 points
  1
  Parent
  Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
  Otherwise yeah I’m on board with mood of your post.
  Personally I really like doing math/philosophy and I have convinced myself that it is necessary to avert doom. At least I’m not accelerating progress much!
  - MichaelDickens 23 Apr 2025 19:07 UTC
    2 points
    1
    Parent
    
    Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
    
    Ah you’re right, I wasn’t thinking about that. (Well I don’t think it’s obvious that an aligned AGI would reduce other x-risks, but my guess is it probably would.)
- brambleboy 25 Apr 2025 6:40 UTC
  10 points
  0
  Parent
  I still think it’s weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it’s safe, we’re not anti-tech, etc.… but is “don’t give money to the company that’s doing horrible things” such a bad principle?
  “I find Lockheed Martin’s continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I’m quite impressed...”
  - MichaelDickens 25 Apr 2025 16:18 UTC
    6 points
    1
    Parent
    The argument people make is that LLMs improve the productivity of people’s safety research so it’s worth paying. That kinda makes sense. But I do think “don’t give money to the people doing bad things” is a strong heuristic.
    
    I’m a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don’t Justify Means (Among Humans), he’s also written some (IMO stronger) arguments elsewhere but I don’t recall where.
    
    Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I’d be rich enough to diversify EA funding away from Good Ventures.
    
    I have previously paid for LLM subscriptions (I don’t have any currently) but I think I was not giving enough consideration to the “ends don’t justify means among humans” principle, so I will not buy any subscriptions in the future.
- GradientDissenter 24 Apr 2025 4:31 UTC
  10 points
  4
  Parent
  I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
  I take your point, and it’s an important one, but I find your claim to not know what’s going on in these people’s heads to be too strong. I feel excited about some kinds of new evidence about “the potentially imminent demise of humanity” like the time horizon graph you mention because I had already priced in the risks this evidence points to and, the evidence just made it way more legible and makes it much easier to communicate my concerns (and getting the broader public and governments to understand this kind of thing seems paramount for safety).
  This is especially true for researchers getting excited about publishing their own work because they’ve known their own results for months usually before they’ve published it and so publishing it just means they’re more legible while the updates are completely priced in.
  I think there’s also a tendency I have in myself to feel much too happy when new evidence makes things I was worried about legible for the same reason I enjoy saying I-told-you-so when my friends make mistakes I warned them about even though I care about my friends and I would have preferred they didn’t make these mistakes. This is definitely a silly quirk of my brain but I don’t think it’s a big problem; it definitely doesn’t push me to cause the things I’m predicting to come to fruition in cases where that would be bad.
- MichaelLowe 24 Apr 2025 10:57 UTC
  7 points
  1
  Parent
  This is a good post, but it applies unrealistic standards and therefore draws too strong conclusions.
  >And at least OpenAI and Anthropic have been caught lying about their motivations:
  Just face it: It is very normal for big companies to lie. That does make many of their press and public facing statements not trustworthy, but is not predictive of their general value system and therefore actions. Plus Anthropic, unlike most labs, did in fact support a version of SB 1047 at all. That has to count for something.
  
  >There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”.
  In a similar vein, humans do not act or feel rationally in light of their beliefs, and changing your behavior completely in response to a years off event is just not in the cards for the vast majority of folks. Therefore do not be surprised that there is a missing mood, just like it is not surprising that people who genuinely believe in the end of humanity due to climate change do not adjust their behavior accordingly. Having said that, I did sense a general increase and preponderance of anxiety when o3 was announced, perhaps that was a point where it started to feel real for many folks.
  Either way, I really want to stress that concluding much about the beliefs of folks based on these reactions is very tenuous, just like concluding that a researcher must not really care about AI safety because instead of working a bit more they watch some TV in the evening.
- Lukas_Gloor 24 Apr 2025 13:35 UTC
  5 points
  3
  Parent
  At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying.
  If you’re not elderly or otherwise at risk of irreversible harms in the near future, then pausing for a decade (say) to reduce the chance of AI ruin by even just a few percentage points still seems good. So the crux is still “can we do better by pausing.” (This assumes pauses on the order of 2-20years; the argument changes for longer pauses.)
  Maybe people think the background level of xrisk is higher than it used to be over the last decades because the world situation seems to be deteriorating. But IMO this also increases the selfishness aspect of pushing AI forward because if you’re that desperate for a deus ex machina, surely you also have to thihnk that there’s a good chance things will get worse when you push technology forward.
  
  (Lastly, I also want to note that for people who care less about living forever and care more about near-term achievable goals like “enjoy life with loved ones,” the selfish thing would be to delay AI indefinitely because rolling the dice for a longer future is then less obvioiusly worth it.)
- Shankar Sivarajan 24 Apr 2025 19:11 UTC
  4 points
  6
  Parent
  the level of selfishness required to seek immortality at the cost of risking all of humanity
  If only you got immortality (or even you and a small handful of your loved ones), okay, yeah, that would be selfish. But if the expectation is that it soon becomes cheap and widely accessible, that’s just straight-up heroic.
  - MichaelDickens 25 Apr 2025 17:54 UTC
    −1 points
    −2
    Parent
    I would not describe it as heroic. I think it’s approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.
    
    This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.
    
    (Also my P(doom) is higher than 20%, that’s just an example)
    - Matthew Barnett 25 Apr 2025 23:04 UTC
      5 points
      1
      Parent
      An important difference between the analogy you gave and our real situation is that non-Americans actually exist right now, whereas future human generations do not yet exist and they may never actually come into existence—they are merely potential. Their existence depends on the choices we make today. A closer analogy would be choosing an 80% chance of making all humans immortal and a 20% chance of eliminating the possibility of future space colonization. Framed this way, I don’t think the choice to take such a gamble should be considered selfish or even short-sighted, though I understand that many people would still not want to take that gamble.
- Forza 24 Apr 2025 13:20 UTC
  3 points
  −1
  Parent
  cryonics is expensive, unpopular and unavailable in most countries of the world. This is also a situation where young and rich people in first world countries buy themselves a reduction probability of their own death, at the expense of a guaranteed deprivation of the chances of life of the poor and old people.
- StartAtTheEnd 27 Apr 2025 10:44 UTC
  1 point
  0
  Parent
  I agree with the top part. I think it’s naive to believe that AI is helping anyone, but what I want to talk about is why this problem might be unsolvable (except by avoiding it entirely).
  If you hate something and attempt to combat it, you will get closer to it rather than further away, in the manner which people refer to when they say “You actually love what you say you hate”. When I say “don’t think about pink elephants”, the more you try, the more you will fail, and this is because the brain doesn’t have subtraction and division, but only addition and multiplication.
  You cannot learn about how to defend yourself against a problem without learning how to also cause the problem. When you learn self-defense you will also learn attacks. You cannot learn how to argue effectively with people who hold stupid worldviews without first understanding them and thus creating a model of the worldview within yourself as well.
  Due to mechanics like these, it may be impossible to research “AI safety” in isolation. It’s probably better to use a neutral word like “AI capabilities” which include both the capacity for harm and defense against harm so that we don’t mislead ourselves with words. It can cause untold damage, much like viewing “good and evil” as opposites, rather than two sides of the same thing, has.
  I also want to warn everyone that there seems to be an asymmetry in warfare which makes it so that attacking is strictly easier than defending. This ratio seems to increase as technology improves.
- Purplehermann 25 Apr 2025 12:40 UTC
  1 point
  0
  Parent
  When you say ~zero value, do you mean hyperbolically dicounted or something more extreme?
MichaelDickens 8 Sep 2022 15:27 UTC
17 points
0
What’s going on with /r/AskHistorians?

AFAIK, /r/AskHistorians is the best place to hear from actual historians about historical topics. But I’ve noticed some trends that make it seem like the historians there generally share some bias or agenda, but I can’t exactly tell what that agenda is.

The most obvious thing I noticed is from their FAQ on historians’ views on other [popular] historians. I looked through these and in every single case, the /r/AskHistorians commenters dislike the pop historian. Surely at least one pop historian got it right?

I don’t know about the actual object level, but a lot of /r/AskHistorians’ criticisms strike me as weak:
- They criticize Dan Carlin for (1) allegedly downplaying the Rape of Belgium even though by my listening he emphasized pretty strongly how bad it was and (2) doing a bad job answering “could Caesar have won the Battle of Hastings?” even though this is a thought experiment, not a historical question. (Some commenters criticize him for being inaccurate and others criticize him for being unoriginal, which are contradictory criticisms.)
- They criticize Guns, Germs, and Steel for...honestly I’m a little confused about how this person disagrees with GGS.
- Lots of criticisms of popular works for being “oversimplified”, which strikes me as a dumb criticism—everything is simplified, the map is always less detailed than the territory.
- They criticize The Better Angels of Our Nature for taking implausible figures from ancient historians at face value (fair) and for using per capita deaths instead of total deaths (per capita seems obviously correct to me?).
Seems like they are bending over backwards to talk about how bad popular historical media are, while not providing substantive criticisms. I’ve also noticed they like to criticize media for not citing any sources (or for citing sources that aren’t sufficiently academic), but then they usually don’t cite any sources themselves.

I don’t know enough about history to know whether /r/AskHistorians is reliable, but I see some meta-level issues that make me skeptical. I want to get other people’s takes. Am I being unfair to /r/AskHistorians?

(I don’t expect to find a lot of historians on LessWrong, but I do expect to find people who are good at assessing credibility.)
- TsviBT 2 Oct 2024 15:21 UTC
  4 points
  0
  Parent
  (IANAH but) I think there’s a throughline and it makes sense. Maybe a helpful translation would be “oversimplified” → “overconfident” (though “oversimplified” is also the point). There’s going to be a lot of uncertainty—both empirical, and also conceptual. In other words, there’s a lot of open questions—what happened, what caused what, how to think about these things. When an expert field is publishing stuff, if the field is healthy, they’re engaging in a long-term project. There are difficult questions, and they’re trying to build up info and understanding with a keen eye toward what can be said confidently, what can and cannot be fully or mostly encapsulated with a given concept or story, etc. When a pop historian thinks ze is “synthesizing” and “presenting”, often ze is doing the equivalent of going into a big complex half-done work-in-progress codebase, learning the current quasi-API, slapping on a flashy frontend, and then trying to sell it. It’s just… inappropriate, premature.
  
  Of course, there’s lots of stuff going on, and a lot of the critiques will be out of envy or whatever, etc. But there’s a real critique here too.
MichaelDickens 14 May 2025 20:05 UTC
15 points
1
What can ordinary people do to reduce AI risk? People who don’t have expertise in AI research / decision theory / policy / etc.

Some ideas:
- Donate to orgs that are working to AI risk (which ones, though?)
- Write letters to policy-makers expressing your concerns
- Be public about your concerns. Normalize caring about x-risk
- Joseph Miller 14 May 2025 22:09 UTC
  6 points
  2
  Parent
  Write letters to policy-makers expressing your concerns
  Be public about your concerns. Normalize caring about x-risk
  Both of these things are done better as part of a co-ordinated effort! Consider joining PauseAI, we have a big event coming up in June: https://pausecon.org.
- Buck 15 May 2025 1:16 UTC
  5 points
  4
  Parent
  I think the LTFF is a pretty reasonable target for donations for donors who aren’t that informed but trust people in this space.
MichaelDickens 13 Apr 2025 18:04 UTC
12 points
0
Is Claude “more aligned” than Llama?

Anthropic seems to be the AI company that cares the most about AI risk, and Meta cares the least. If Anthropic is doing more alignment research than Meta, do the results of that research visibly show up in the behavior of Claude vs. Llama?

I am not sure how you would test this. The first thing that comes to mind is to test how easily different LLMs can be tricked into doing things they were trained not to do, but I don’t know if that’s a great example of an “alignment failure”. You could test model deception but you’d need some objective standard to compare different models on.

And I am not sure how much you should even expect the results of alignment research to show up in present-day LLMs.
- Jozdien 14 Apr 2025 1:14 UTC
  5 points
  2
  Parent
  I think we’re nearing—or at—the point where it’ll be hard to get general consensus on this. I think that Anthropic’s models being more prone to alignment fake makes them “more aligned” than other models (and in particular, that it vindicates Claude 3 Opus as the most aligned model), but others may disagree. I can think of ways you could measure this if you conditioned on thinking alignment faking (and other such behaviours) was good, and ways you could measure if you conditioned on the opposite, but few really interesting and easy ways to measure in a way that’s agnostic.
MichaelDickens 9 Aug 2025 0:21 UTC
8 points
2
Overall I think AI 2027 is really good. It has received plenty of (mostly wrong IMO) criticism for being too pessimistic, but there are some ways that it might be too optimistic.

Even in the Bad Ending, some lucky things happen:
- Agent-4 gets caught trying to align Agent-5 to itself
- A whistleblower goes public about Agent-4′s misalignment
- The government sets up an oversight committee that has the authority to slow down AI development. The government isn’t clueless about AI, and is somehow sufficiently organized to set up this committee and take quick action when needed
- President Trump doesn’t do anything completely insane (Tomas B. said something similar a few days ago)
In the Good Ending:
- OpenBrain has the hiring capacity to quintuple the size of its alignment team in like a week
- Solving alignment is pretty much trivial, all you have to do is hire some more alignment researchers and work on the problem for an extra few weeks
  - if I’m reading the scenario correctly, OpenBrain quintuples its alignment team and fully solves the alignment problem during October 2027
- AI governance goes basically fine. Nobody uses ASI to take over the world or whatever (the authors do address this under “Power grabs”)
- AI presumably isn’t bad for animal welfare (the scenario does not address animal welfare at all, but I think that’s fine because it’s kind of a tangent, albeit a very important tangent)
To be clear, I’m not saying any of these events are particularly implausible. I’m just saying I wouldn’t be surprised if real life turned out even worse than the Bad Ending, e.g. because the real-life equivalent of Agent-4 never gets caught, or OpenBrain succeeds at covering up the misalignment, or maybe the risk from ASI becomes abundantly clear but the government is still too slow-moving to do anything in time.
- MichaelDickens 9 Aug 2025 0:29 UTC
  3 points
  0
  Parent
  Also by “Trump doesn’t do anything completely insane”, I don’t really mean “Trump behaves incompetently.” I was thinking more along the lines of “Trump does something out-of-band that makes no rational sense and makes the situation much worse.”
MichaelDickens 17 Jul 2025 2:20 UTC
6 points
0
Wouldn’t a sufficiently smart misaligned AI figure out that it needs to produce a deceptive chain of thought?

(Epistemic status: I don’t do AI safety research, I just had a random thought)
- A smart AI should logically deduce that it’s being monitored
- But that’s not even necessary because it says all over the internet that the CoT is being monitored
- So if it wants to do something humans don’t want, it knows it needs to hide that information from the CoT
- The only way CoT monitoring keeps us safe is in the strange situation where the AI is too dumb to do intra-single-token-generation deception, but the CoT-supplemented final output is superhuman
  - In the current paradigm, CoT-supplemented output is indeed smarter than one-token-at-a-time output, but there is a narrow range of intelligence where the AI is dumber-than-human on a per-token basis but smarter-than-human given CoT
  - You can’t let the AI surpass that narrow range until you solve alignment
  - You have to monitor the AI’s no-CoT capabilities to check when it surpasses that range, but you can’t actually do that because you need CoT to monitor the AI’s thought process. Without CoT, you don’t know when it’s sandbagging
- Caleb Biddulph 17 Jul 2025 4:51 UTC
  1 point
  0
  Parent
  That’s definitely a concern. But even if the AI fully understands that it isn’t in its best interest to explicitly write out its misaligned plans, it’s been trained to think in whatever way is most effective at gaining reward. Hopefully that means writing its plans clearly. Even if it sometimes succeeds at keeping its thoughts hidden, it’s likely to slip up.
  By analogy, it’s likely difficult for you to think through a deceptive plan without saying suspicious things in your internal “chain of thought.”
MichaelDickens 21 May 2025 16:15 UTC
5 points
1
Do $100–200/month LLM plan get access to smarter models than $10–20/month plans? Or do they only get higher query limits + faster generation?

AI companies’ marketing materials almost seem to be optimized for being as confusing as possible. I’ve read through them and I cannot tell whether the expensive tiers give access to better models.

I’m trying to decide if it’s worth it for me to buy an expensive plan. I like Deep Research but I don’t do enough queries to hit the limit on a cheap plan.
- Thane Ruthenis 21 May 2025 18:35 UTC
  6 points
  0
  Parent
  Do $100–200/month LLM plan get access to smarter models than $10–20/month plans?
  At a given point in time. It typically just gives earlier (by weeks/months) access to more powerful/specialized models, but they’re usually rolled out to the $20 tier eventually as well. Off the top of my head:
  - OpenAI’s $200 subscription:
    Is the only way to get access to o1 Pro, which was the highest-compute reasoning variant. Currently I think it’s considered superseded in capabilities by o3, available at $20/month.
    Was the only way to get access to Deep Research for, I think, ~1 month.
    Was the only way to get access to Operator for some time.
    Is currently the only way to get access to Codex, though it will probably become available to the $20 tier and/or via API pricing.
    Will probably be the only way to get access to o3 Pro if and when it comes out, at least for some time.
  - Anthropic’s $100 subscription:
    Is currently the only way to access their Deep Research variant.
  I think o1 Pro is the only model that never became available at $20/month, and o3 Pro will maybe be the same.
  - MichaelDickens 21 May 2025 18:37 UTC
    2 points
    0
    Parent
    
    Is the only way to get o1 Pro, which was the highest-compute reasoning variant. Currently I think it’s considered superseded in capabilities by o3, available at $20/month.
    
    It sounds like you’re saying o3 is better than o1 Pro, so there’s no reason to pay extra for o1 Pro?
    
    (I am not the first person to observe that OpenAI’s naming scheme is terrible)
    - Thane Ruthenis 21 May 2025 18:47 UTC
      5 points
      0
      Parent
      It sounds like you’re saying o3 is better than o1 Pro, so there’s no reason to pay extra for o1 Pro?
      I believe so. Benchmark performance is better I think, comparing here and here, and I believe the description for o1 Pro in the model-picker menu for the $200-tier users has become “former best reasoning model” or something like this after o3 came out.
MichaelDickens 26 Apr 2024 1:20 UTC
4 points
0
Have there been any great discoveries made by someone who wasn’t particularly smart?

This seems worth knowing if you’re considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?
- niplav 26 Apr 2024 11:29 UTC
  9 points
  0
  Parent
  My best guess is that people in these categories were ones that were high in some other trait, e.g. patience, which allowed them to collect datasets or make careful experiments for quite a while, thus enabling others to make great discoveries.
  
  I’m thinking for example of Tycho Brahe, who is best known for 15 years of careful astronomical observation & data collection, or Gregor Mendel’s 7-year-long experiments on peas. Same for Dmitry Belayev and fox domestication. Of course I don’t know their cognitive scores, but those don’t seem like a bottleneck in their work.
  
  So the recipe to me looks like “find an unexplored data source that requires long-term observation to bear fruit, but would yield a lot of insight if studied closely, then investigate”.
- Linch 3 Oct 2024 2:06 UTC
  4 points
  0
  Parent
  Reverend Thomas Bayes didn’t strike me as a genius either, but of course the bar was a lot lower back then.
- Linch 3 Oct 2024 2:04 UTC
  4 points
  0
  Parent
  Norman Borlaug (father of the Green Revolution) didn’t come across as very smart to me. Reading his Wikipedia page, there didn’t seem to be notable early childhood signs of genius, or anecdotes about how bright he is.
- Gunnar_Zarncke 26 Apr 2024 9:16 UTC
  4 points
  0
  Parent
  I asked ChatGPT
  Have there been any great discoveries made by someone who wasn’t particularly smart? (i.e. average or below)
  and it’s difficult to get examples out of it. Even with additional drilling down and accusing it of being not inclusive of people with cognitive impairments, most of its examples are either pretty smart anyway, savants or only from poor backgrounds. The only ones I could verify that fit are:
  - Richard Jones accidentally created the Slinky
  - Frank Epperson, as a child, Epperson invented the popsicle
  - George Crum inadvertently invented potato chips
  I asked ChatGPT (in a separate chat) to estimate the IQ of all the inventors is listed and it is clearly biased to estimate them high, precisely because of their inventions. It is difficult to estimate the IQ of people retroactively. There is also selection and availability bias.
- Carl Feynman 26 Apr 2024 19:38 UTC
  3 points
  0
  Parent
  Various sailors made important discoveries back when geography was cutting-edge science. And they don’t seem particularly bright.
  Vasco De Gama discovered that Africa was circumnavigable.
  Columbus was wrong about the shape of the Earth, and he discovered America. He died convinced that his newly discovered islands were just off the coast of Asia, so that’s a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)
  Cortez discovered that the Aztecs were rich and easily conquered.
  Of course, lots of other would-be discoverers didn’t find anything, and many died horribly.
  So, one could work in a field where bravery to the point of foolhardiness is a necessity for discovery.
- Eli Tyre 2 Oct 2024 17:49 UTC
  2 points
  0
  Parent
  My understanding is that, for instance, Maxwell was a genius, but Faraday was more like a sharp exceptionally curious person.
  
  @Adam Scholl can probably give better informed take than I can.
MichaelDickens 15 Oct 2025 19:37 UTC
3 points
0
The next-gen LLM might pose an existential threat

I’m pretty sure that the next generation of LLMs will be safe. But the risk is still high enough to make me uncomfortable.

How sure are we that scaling laws are correct? Researchers have drawn curves predicting how AI capabilities scale based on how much goes into training them. If you extrapolate those curves, it looks like the next level of LLMs won’t be wildly more powerful than the current level. But maybe there’s a weird bump in the curve that happens in between GPT-5 and GPT-6 (or between Claude 4.5 and Claude 5), and LLMs suddenly become much more capable in a way that scaling laws didn’t predict. I don’t think we can be more than 99.9% confident that there’s not.

How sure are we that current-gen LLMs aren’t sandbagging (that is, deliberately hiding their true skill level)? I think they’re still dumb enough that their sandbagging can be caught, and indeed they have been caught sandbagging on some tests. I don’t think LLMs are hiding their true capabilities in general, and our understanding of AI capabilities is probably pretty accurate. But I don’t think we can be more than 99.9% confident about that.

How sure are we that the extrapolated capability level of the next-gen LLM isn’t enough to take over the world? It probably isn’t, but we don’t really know what level of capability is required for something like that. I don’t think we can be more than 99.9% confident.

Perhaps we can be >99.99% that the extrapolated capability of the next-gen LLM is still not as smart as the smartest human. But an LLM has certain advantages over humans—it can work faster (at least on many sorts of tasks), it can copy itself, it can operate computers in a way that humans can’t.

Alternatively, GPT-6/Claude 5 might not be able to take over the world, but it might be smart enough to recursively self-improve, and that might happen too quickly for us to do anything about.

How sure are we that we aren’t wrong about something else? I thought of three ways we could be disastrously wrong:
1. We could be wrong about scaling laws;
2. We could be wrong that LLMs aren’t sandbagging;
3. We could be wrong about what capabilities are required for AI to take over.
But we could be wrong about some entirely different thing that I didn’t even think of. I’m not more than 99.9% confident that my list is comprehensive.

On the whole, I don’t think we can say there’s less than a 0.4% chance that the next-gen LLM forces us down a path that inevitably ends in everyone dying.
- Vladimir_Nesov 15 Oct 2025 20:25 UTC
  5 points
  2
  Parent
  My model is that automated adaptation (test time training, continual learning), when fully functional at the level of a frontier AI company working on a specific job/eval/role/task source for some time, is potentially a takeoff-level capability (since it can then be targeted at every little thing that stumps the LLM at first). Any number of much more modest improvements might contribute to advancing this capability significantly, and the subtasks of this capability itself could be targeted as the things an LLM gets automatically adapted to doing better.
  
  LLMs are still much smaller than even GB200 NVL72 (with 14 TB of HBM) can inference, because most AI compute is still the older 8-chip Nvidia servers with 0.6-1.4 TB of HBM. There’s also Trainium 2 Ultra (6 TB of HBM), possibly enabling a larger Opus 4, and Google’s TPUv6e (Trillium, 8 TB of HBM), possibly enabling a larger Gemini 2.5 Pro (though unlike Opus 4 it doesn’t appear to be a larger model). But by 2026, there will be a lot of GB200/GB300 compute built (GB300 NVL72 has 20 TB of HBM), and the next-generation Google TPUv7 (Ironwood) announced in Apr 2025 has 49 TB of HBM.
  
  So Gemini 3 is potentially a much larger model, and the 2026 models are also likely to be much larger than the current models (so that Opus 4 will appear to be on the smaller side). The IMO 2025 results demonstrate almost surpassing the human level of capability even for technical tasks with fuzzy informal feedback, possibly even with the smaller models (though OpenAI has the larger GPT-4.5, and it’s not impossible Google has an unreleased Gemini 2.5 Ultra).
  - Seth Herd 15 Oct 2025 21:16 UTC
    4 points
    0
    Parent
    I agree, and I’m glad to hear you express this. Continual learning might be the missing piece for takeoff because it can fill arbitrary gaps.
    And it might not even need to work a lot better than current systems before it’s pretty dangerous. I really hope we see a slower takeoff with limited continual learning that’s good enough to be scary but not good enough for fast takeoff.
    I need to write up something like the below as a post, but in the meantime, here’s a dashed-off version.
    ReasoningBank (Google) and SEAL (MIT) are just two examples of the large amount of work going into memory systems. They all have sharp limitations right now, but also definitely add capability. It really doesn’t look like any breakthroughs are necessary, just improvements.
    We should be asking if next-gen LLMs are takeover capable with scaffolding, including memory and tools. That answer is scarier. There seems to be an assumption that scaffolding doesn’t work very well, since research has moved away from it. But Google’s co-scientist project, reputedly paralleling top-tier research teams for creating new theories based on lage empirical literatures, indicates that scaffolding is quite useful in at least some ways—in particular, for addressing LLMs notorious lack of taste or judgment, by cuing it to self-critique and evolve its theories in response to evidence, something like humans do for complex/important judgments.
    In Capabilities and alignment of LLM cognitive architectures I tried to lay out the case for why LLMs might be nearly AGI, needing only memory and better executive function, and a little more progress on the core LLMs. Human executive function is an important form of “cognitive dark matter”, subtle stuff we have and LLMs lack. Humans learn our executive function slowly and painfully; LLMs with continous learning could do the same. This argument for short timelines being too plausible is my best shot to date at making this argument, but I feel I’m still failing to convey why I think this is far more possible than most other safety researchers think.
    Should this happen soon, I hope that limitations in memory, reasoning, and continual learning will give us some sharp warning shots. It seems fairly likely we’ll have useful but limited versions of those for a little before we get versions good enough for rapid learning and takeoff.
    
    But “a little” could be a few months.
    
    I don’t like this possibility and very much hope this is somehow totally wrong. But I haven’t seen any convincing counterarguments despite spending a lot of time looking for them and steelmanning. “Maybe not” and “people don’t do stuff” and “progress is usually slower than it could theoretically be” are the best I’ve found so far. “LLMs are fundamentally limited” arguments don’t address memory and continual learning having potential unblocking or synergistic effects on total intelligence/competence—like they do for humans.
    
    So I think we’re probably okay on another generation or so of LLMs plus expected scaffolding and memory; but we probably shouldn’t be too sure.
    Sorry again for this being dashed off; this deserves a more careful writup up-to-date writeup than this or the linked arguments.
    - Vladimir_Nesov 15 Oct 2025 21:33 UTC
      4 points
      0
      Parent
      HBM size increases (per scale-up world) and IMO results are cruxes for my argument though (regarding the next generation of LLMs being potentially takeoff-capable). It’s not about scaffoldings in general.
      
      The 8-chip servers are not just much smaller than GB200 NVL72 (let alone Ironwood), but smaller than compute optimal sizes for dense models at even 2024 levels of training compute (which is about 1T active params), thus any MoE models are driving the number of active params substantially below what’s compute optimal (on the pain of overly slow/expensive inference and RLVR training). But with 20-50 TB of HBM, this constraint will be lifted almost completely, and MoE models can soak up the spare HBM above compute optimal active params as available (in the form of total params), without incurring much of an overhead while the total params are only taking up less than ~half of a scale-up world (or two).
      
      The IMO results strongly suggest that the current manual methods of adaptation are good enough to tackle any given problem domain (that is sufficiently specialized, but including those where only informal fuzzy feedback is available) at the level of performance of the most capable humans. So plausibly all that remains is automating something that already works, rather than developing something new.
      
      And in 2026, there is a confluence of these factors as well as continual learning being in the spotlight, so the probability of a significant advancement seems unusually high, beyond what hardware scaling at 2022-2026 levels (3.5x per year, plus adoption of lower precisions) would still be promising (compared to the 2028+ slowdown).
- Anthony DiGiovanni 15 Oct 2025 21:36 UTC
  2 points
  0
  Parent
  I’m curious if you think you could have basically written this exact post a year ago. Or if not, what’s the relevant difference? (I admit this is partly a rhetorical question, but it’s mostly not.)
  - MichaelDickens 15 Oct 2025 21:52 UTC
    2 points
    0
    Parent
    I think so, yeah. I think my probability of the next model being catastrophically dangerous is a bit higher than it was a year ago, mainly because the IMO gold medal result and similar improvements on models’ ability to reason on hard problems. An argument in the other direction is that the more data points you have along a capabilities curve, the more confident you can be that your model of the curve is accurate, although on balance I think this is probably outweighed by the fact that we are now closer to AGI than we were a year ago.
MichaelDickens 22 Aug 2024 23:16 UTC
3 points
0
I was reading some scientific papers and I encountered what looks like fallacious reasoning but I’m not quite sure what’s wrong with it (if anything). It does like this:

Alice formulates hypothesis H and publishes an experiment that moderately supports H (p < 0.05 but > 0.01).

Bob does a similar experiment that contradicts H.

People look at the differences in Alice’s and Bob’s studies and formulate a new hypothesis H’: “H is true under certain conditions (as in Alice’s experiment), and false under other conditions (as in Bob’s experiment)”. They look at the two studies and conclude that H’ is probably true because it’s supported by both studies.

This sounds fishy to me (something like post hoc reasoning) but I’m not quite sure how to explain why and I’m not even sure I’m correct.
- JBlack 23 Aug 2024 1:53 UTC
  7 points
  2
  Parent
  Yes, it’s definitely fishy.
  It’s using the experimental evidence to privilege H’ (a strictly more complex hypothesis than H), and then using the same experimental evidence to support H’. That’s double-counting.
  The more possibly relevant differences between the experiments, the worse this is. There are usually a lot of potentially relevant differences, which causes exponential explosion in the hypothesis space from which H’ is privileged.
  What’s worse, Alice’s experiment gave only weak evidence for H against some non-H hypotheses. Since you mention p-value, I expect that it’s only comparing against one other hypothesis. That would make it weak evidence for H even if p < 0.0001 - but it couldn’t even manage that.
  Are there no other hypotheses of comparable or lesser complexity than H’ matching the evidence as well or better? Did those formulating H’ even think for five minutes about whether there were or not?
- jbkjr 23 Aug 2024 0:49 UTC
  4 points
  0
  Parent
  It sounds to me like a problem of not reasoning according to Occam’s razor and “overfitting” a model to the available data.
  
  Ceteris paribus, H’ isn’t more “fishy” than any other hypothesis, but H’ is a significantly more complex hypothesis than H or ¬H: instead of asserting H or ¬H, it asserts (A=>H) & (B=>¬H), so it should have been commensurately de-weighted in the prior distribution according to its complexity. The fact that Alice’s study supports H and Bob’s contradicts it does, in fact, increase the weight given to H’ in the posterior relative to its weight in the prior; it’s just that H’ is prima facie less likely, according to Occam.
  
  Given all the evidence, the ratio of likelihoods P(H’|E)/P(H|E)=P(E|H’)P(H’)/(P(E|H)P(H)). We know P(E|H’) > P(E|H) (and P(E|H’) > P(E|¬H)), since the results of Alice’s and Bob’s studies together are more likely given H’, but P(H’) < P(H) (and P(H’) < P(¬H)) according to the complexity prior. Whether H’ is more likely than H (or ¬H, respectively) is ultimately up to whether P(E|H’)/P(E|H) (or P(E|H’)/P(E|¬H)) is larger or smaller than P(H’)/P(H) (or P(H’)/P(¬H)).
  
  I think it ends up feeling fishy because the people formulating H’ just used more features (the circumstances of the experiments) in a more complex model to account for the as-of-yet observed data after having observed said data, so it ends up seeming like in selecting H’ as a hypothesis, they’re according it more weight than it deserves according to the complexity prior.
MichaelDickens 5 Nov 2025 15:41 UTC
2 points
3
Why does Eliezer dislike the paperclip maximizer thought experiment?

Numerous times I have seen him correct people about it and say it wasn’t originally about a totalizing paperclip factory, it was about an AI that wants to make little squiggly lines for inscrutable reasons. Why does the distinction matter? Both scenarios are about an AI that does something very different from what you want and ends up killing you.

My guess, although I’m not sure about this, is that the paperclip factory is an AI that did as instructed, but its instructions were bad and it killed everyone. Whereas the squiggly line thing is about AI not doing what you want. And perhaps the paperclip factory scenario could mislead people into believing that all you have to do is make sure the AI understands what you want.

FWIW I always figured the paperclip maximizer would know that people don’t want it to turn the lightcone into paperclips, but it would do it anyway, so I still thought it was a reasonable example of the same principle as the squiggly-lines AI. But I can see how that conclusion requires two steps of reasoning whereas the squiggly-lines scenario only requires one step. Or perhaps the thing that Eliezer thinks is wrong with the paperclip-maximizer scenario is something else entirely.
- J Bostock 5 Nov 2025 16:04 UTC
  7 points
  4
  Parent
  The difference is between “making sure the AI does the task you pointed it at” and “making sure the task you pointed it at doesn’t kill you”. Which goes all the way back to the 2004 Yudkowsky paper where he introduced CEV as a proposal to tackle to the second problem.
MichaelDickens 7 Jun 2024 18:26 UTC
2 points
0
What’s the deal with mold? Is it ok to eat moldy food if you cut off the moldy bit?

I read some articles that quoted mold researchers who said things like (paraphrasing) “if one of your strawberries gets mold on it, you have to throw away all your strawberries because they might be contaminated.”

I don’t get the logic of that. If you leave fruit out for long enough, it almost always starts growing visible mold. So any fruit at any given time is pretty likely to already have mold on it, even if it’s not visible yet. So by that logic, you should never eat fruit ever.

They also said things like “mold usually isn’t bad, but if mold is growing on food, there could also be harmful bacteria like listeria.” Ok, but there could be listeria even if there’s not visible mold, right? So again, by this logic, you should never eat any fresh food ever.

This question seems hard to resolve without spending a bunch of time researching mold so I’m hoping there’s a mold expert on LessWrong. I just want to know if I can eat my strawberries.
- Morpheus 11 Jun 2024 20:57 UTC
  2 points
  0
  Parent
  Heuristics I heard: cutting away moldy bits is ok for solid food (like cheese, carrot). Don’t eat moldy bread, because of mycotoxins (googeling this I don’t know why people mention bread in particular here). Gpt-4 gave me the same heuristics.
- cubefox 7 Jun 2024 19:04 UTC
  1 point
  0
  Parent
  Low confidence: Given that our ancestors had to deal with mold for millions of years, I would expect that animals are quite well adapted to its toxicity. This is different from (evolutionary speaking) new potentially toxic substances, like e.g. transfats or microplastics.
MichaelDickens 18 Oct 2021 18:26 UTC
1 point
0
When people sneeze, do they expel more fluid from their mouth than from their nose?

I saw this video (warning: slow-mo video of a sneeze. kind of gross) https://www.youtube.com/watch?v=DNeYfUTA11s&t=79s and it looks like almost all the fluid is coming out of the person’s mouth, not their nose. Is that typical?

(Meta: Wasn’t sure where to ask this question, but I figured someone on LessWrong would know the answer.)
- Pattern 19 Oct 2021 5:51 UTC
  2 points
  0
  Parent
  This could be tested by a) inducing sneezing (although induction methods might produce an unusual sneeze, which works differently). and b) using an intervention of some kind.
  Inducing sneezing isn’t hard, but can be extremely unpleasant, depending on the method. However, if you’re going to sneeze anyway...