MichaelDickens comments on “The Urgency of Interpretability” (Dario Amodei)

MichaelDickens 28 Apr 2025 14:08 UTC
10 points
2
Just my personal opinion:

My sense is that Anthropic is somewhat more safety-focused than the other frontier AI companies, in that most of the companies only care maybe 10% as much about safety as they should, and Anthropic cares 15% as much as it should.

What numbers would you give to these labs?

My median guess is that if an average company is −100 per dollar then Anthropic is −75. I believe Anthropic is making things worse on net by pushing more competition, but an Anthropic-controlled ASI is a bit less likely to kill everyone than an ASI controlled by anyone else.

But I also have significant (< 50%) probability on Anthropic being the worst company in terms of actual consequences because its larger-but-still-insufficient focus on safety may create a false sense of security that ends up preventing good regulations from being implemented.

You may also be interested in SaferAI’s risk management ratings.

I used to think Anthropic was [...] quite in sync with the AI x-risk community.

I think Anthropic leadership respects the x-risk community in their words but not in their actions. Anthropic says safety is important, and invests a decent amount into safety research; but also opposes coordination, supports arms races, and has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter).
- 1a3orn 28 Apr 2025 14:26 UTC
  4 points
  0
  Parent
  
  has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter)
  
  I have the courage of my convictions; you ignore the opinions of others; he takes reckless unilateral action.
  - MichaelDickens 28 Apr 2025 15:05 UTC
    1 point
    3
    Parent
    The question under discussion was: Is Anthropic “quite in sync with the AI x-risk community”? If it’s taking unilateral actions that are unpopular with the AI x-risk community, then it’s not in sync.
    - Casey_ 1 May 2025 15:33 UTC
      11 points
      6
      Parent
      there were multiple questions under discussion.
      his reply could validly be said to apply to the subset of your post, implying or directly saying, that anthropic is doing a bad thing, ie he is highlighting that Disagreement is real and allowed
      
      you are correct that there is a separate vein here about the factual question of whether they are “in sync” with the ai x-risk community. that is a separate question that 1a3orn was not touching with their reply. you are mixing the two frames. if this was intentional then you were being disingenuous. if it was unintentional, you were being myopic
- Knight Lee 28 Apr 2025 23:38 UTC
  1 point
  0
  Parent
  Do you see any hope of convincing them they’re not net positive influence and they should shut down all their capabilities projects? Or is that simply not realistic human behaviour?
  From the point of view of rational self interest, I’m sure they care more about surviving the singularity and living a zillion years, rather than temporarily being a little richer for 3 years^[1] until the world ends (I’m sure these people can live comfortably while waiting).
  1. ^
    I think Anthropic predicts AGI in 3 years but I’m unsure about ASI (superintelligence)
  - Ben Pace 29 Apr 2025 0:59 UTC
    11 points
    5
    Parent
    Not only would most people be hopelessly lost on these questions (“Should I give up millions-of-dollars-and-personal-glory and then still probably die just because it is morally right to do so?”), they have also picked up something that they cannot put down. These companies have 1,000s of people making millions of dollars, and they will reform in another shape if the current structure is broken apart. If we want to put down what has been picked up more stably, we must use other forces that do not wholly arise from within the companies.
    - Knight Lee 29 Apr 2025 1:30 UTC
      3 points
      0
      Parent
      I agree that it’s psychologically very difficult, and that “is my work a net positive” is also hard to answer.
      But I don’t think it’s necessarily about millions of dollars and personal glory. I think the biggest difficulty is the extreme social conflict and awkwardness you would have telling researchers who are very personally close to you to simply shut down their project full of hard work, and tell them to oh do something else that probably won’t make money and in the end we’ll probably go bankrupt.
      As for millions of dollars, the top executives have enough money they won’t feel the difference.
      As for “still probably die,” well from a rational self interest point of view they should spend the last years they have left on vacation, rather than stressing out at a lab.
      As for personal glory, it’s complicated. I think they genuinely believe there is a very decent chance of survival, in which case “doing the hard unpleasant thing” will result in far more glory in the post-singularity world. I agree it may be a factor in the short term.
      I think questions like “is my work a net positive?” “Is my ex-girlfriend more correct about our breakup than me?” and “Is the political party I like running the economy better?” are some of the most important questions in life. But all humans are delusional about these most important questions in life, and no matter how smart you are, wondering about these most important questions will simply give your delusions more time find reassurances that you aren’t delusional.
      The only way out is to look at how other smart rational people are delusional, and how futile their attempts at self questioning are, and infer that holy shit this could be happening to me too without me realizing it.
      - Ben Pace 29 Apr 2025 5:33 UTC
        10 points
        1
        Parent
        Not sure I get your overall position. But I don’t believe all humans are delusional about the most important questions in their lives. See here for an analysis of pressures on people that can cause them to be insane on a topic. I think you can create inverse pressures in yourself, and you can also have no pressures and simply use curiosity and truth-seeking heuristics. It’s not magic to not be delusional. It just requires doing the same sorts of cognition you use to fix a kitchen sink.
        Knight Lee 29 Apr 2025 9:45 UTC
        1 point
        0
        Parent
        Admittedly, I got a bit lost writing the comment. What I should’ve wrote was: “not being delusional is either easy or hard.”
        If it’s easy, you should be able to convince them to stop being delusional, since it’s their rational self interest.
        If it’s hard, you should be able to show them how hard and extremely insidious it is, and how one cannot expect oneself to succeed, so one should be far more uncertain/concerned about delusion.
  - MichaelDickens 29 Apr 2025 1:29 UTC
    8 points
    3
    Parent
    I think there is some hope but I don’t really know how to do it. I think if their behavior was considered sufficiently shameful according to their ingroup then they would stop. But their ingroup specifically selects for people who think they are doing the right thing.
    
    I have some small hope that they can be convinced by good arguments, although if that were true, surely they would’ve already been convinced by now? Perhaps they are simply not aware of the arguments for why what they’re doing is bad?