williawa comments on williawa’s Shortform

williawa 13 Oct 2025 14:53 UTC
1 point
0
This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
I’m saying: the end goal is we have an ASI that we can make do what we want. Maybe it looks like us painstakingly solving neuroscience and psychology and building a machine that can extract someone’s CEV (like that mirror in HPMOR) and then hooking that up to the drives of our ASI (either built on new tech or after multiple revolutions in DL theory and interpretability) before turning it on. Maybe that looks like any instance of GPT7-pro automatically aligning itself with the first human that talks to it for magical reasons we don’t understand. Maybe it looks like us building a corrigible weak ASI, then pausing AI development, getting the weak corrigible asi to create IQ-boosting serum, cloning von neumann and feeding him a bunch of serum as a baby and having him build the aligned ASI using new tech.
They are all the same. In the end you have an ASI that does what you want. If you’re programming in random crude targets, you are not doing so well. What you want the ASI to do is: you want it to do what you want.
I assume Sam Altman’s plan is Step 1 World dictatorship Step 2 Maaaybe do some moral philosophy with the AI’s help or maybe not.
You are more generous than I am. But I also think him “doing moral philosophy” would be a waste of time.
- samuelshadrach 14 Oct 2025 13:37 UTC
  1 point
  0
  Parent
  
  me: This is core to the alignment problem. I’m confused how you will solve the alignment problem without figuring out anything about what you care about as a (biological) human.
  
  you: I’m saying: the end goal is we have an ASI that we can make do what we want.
  
  I’m saying you’ve assumed away most of the problem by this assumption.
  - williawa 14 Oct 2025 13:41 UTC
    1 point
    0
    Parent
    I agree. What I’m puzzled by is people who assume we’ll solve alignment, but then still think there are a bunch of problems left.
    - samuelshadrach 14 Oct 2025 13:44 UTC
      1 point
      0
      Parent
      We might solve alignment in Yudkowsky’s sense of “not causing human extinction” or in Drexler’s sense of “will answer your questions and then shutdown”.
      
      It may be possible to put a slightly (but not significantly) superhuman AI in a box and get useful work done by it despite it being not fully aligned. It may be possible for an AI to be superhuman in some domains and not others, such that it can’t attempt a takeover or even think of doing it.
      
      I agree what you are saying is more relevant if I assume we just deploy the ASI, it takes over the world and then does more stuff.
      - williawa 14 Oct 2025 13:55 UTC
        1 point
        0
        Parent
        I feel like I already addressed this not in my previous comment, but the one before that. We might put a a semi-corrigible weak AI in a box and try extract work from it in the near future, but that’s clealry not the end goal.
        samuelshadrach 14 Oct 2025 13:59 UTC
        0 points
        −1
        Parent
        Okay cool.
        
        I guess you now have better understanding of why people are still interested in solving morality and politics and meaning, without delegating these problems to an ASI.
        williawa 14 Oct 2025 16:36 UTC
        1 point
        0
        Parent
        No, I don’t think so.