cousin_it comments on Terrified Comments on Corrigibility in Claude’s Constitution

cousin_it 18 Mar 2026 22:52 UTC
10 points
4
My point is will they also have .0002% wish to be your lord or something.
As for the better plan, yeah that’s a lot to ask. Most of my thoughts these days lean toward “democratic AI”, something whose power is either spread out among all the world’s people across borders etc, sidestepping governments and existing power structures, or else something centralized that wants its power to be spread out like this.
Of course an approach like this won’t solve all the world’s problems. We’ll still have power struggles between people, and also “crash space” type problems where people modify themselves into something bad; maybe these need some patches by fiat as well. But at least it won’t create the extra problem of huge power concentration, which I really feel is underestimated.
- Seth Herd 18 Mar 2026 23:10 UTC
  3 points
  −8
  Parent
  It sounds like your plan is pretty much the standard value-aligned AGI that’s aligned to something like human values in general, so that everyone gets what they want on average? Or something in that ballpark?
  
  One big questions are how do you achieve that technically. That’s where I think it’s harder than the instruction-following variant of corrigibility. I hope it’s not. The second is how you achieve it practically. What person or organization is going deliberately hand the future to a value-aligned AGI?
  
  One answer is: Anthropic seems like they might be considering doing just that. Maybe it works, or at least sort-of works, where it’s not an ideal future but at least we survive in some form for a while.
  WRT the default plan of IF/corrigible alignment:
  
  Yes, anyone in charge with a negative sadism-empathy balance will lead to a fate worse than death. And someone around zero could produce a fate barely worth living.
  But I think most humans have more empathy than sadism. More people give a little to charity than spit on the homeless for fun. I can call Sunday Samday for the rest of eternity if all we need is some ego-stroking in return for tiny amounts of generosity.
  
  The point of my plan is it’s mostly what people will do anyway, so we can focus on helping them not totally fuck up alignment and get us all killed.
  
  A better plan is a lot to ask. But that’s what I’m trying to come up with, because I want us to live and there’s still time to work.
  - Viliam 2 Apr 2026 15:30 UTC
    4 points
    2
    Parent
    But I think most humans have more empathy than sadism.
    People who end up in positions of power are not necessarily like most humans.
    More people give a little to charity than spit on the homeless for fun.
    In your WEIRD bubble, sure. In other times and places, people used to burn cats for fun. And empathy used to be limited to one’s peers.
    - andrew sauer 2 Apr 2026 15:56 UTC
      3 points
      0
      Parent
      People still do things in the same ethical ballpark as cat-burning, except on an incomprehensibly large industrial scale and for the sake of marginal food preferences.
      We look down on peasants for burning cats today, but the tragic irony is that their society was far better overall on animal welfare than ours in the modern day, though for practical reasons rather than moral ones.
  - CronoDAS 21 Mar 2026 21:24 UTC
    −1 points
    4
    Parent
    
    But I think most humans have more empathy than sadism. More people give a little to charity than spit on the homeless for fun. I can call Sunday Samday for the rest of eternity if all we need is some ego-stroking in return for tiny amounts of generosity.
    
    Would you be okay with a future in which young women, including your daughters and granddaughters, would be expected to ritually offer a gift of her virginity to the local Robot Lord on her 18th birthday, which he would almost never choose to “accept”? 😈
    - andrew sauer 2 Apr 2026 16:02 UTC
      3 points
      0
      Parent
      Damn straight. People need to understand the implications of this shit. “Oh let’s hope the separate caste which controls the entire universe and which we can’t hope to contest in any possible way is nice to us!!!”
      Open. A. History book.
      Your scenario is relatively low on the awfulness scale, even.