xpym comments on 6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

xpym 4 Dec 2025 11:27 UTC
−1 points
−4

will future powerful AGI / ASI “by default” lack Approval Reward altogether?

I’d say that pessimists are similar to LLM optimists in their conviction that it would be pretty easy to match and then greatly surpass general human intelligence, trusting their own intuitions far too much. Of course, once that assumption is made, everything else straightforwardly follows.
- Seth Herd 7 Dec 2025 18:19 UTC
  3 points
  0
  Parent
  Possibly, sometimes. But greatly surpassing human intelligence isn’t really part of the risk model. Even humans have pretty much succeeded at taking over the world. It’s only got to be as functionally smart, in relevant ways, as a human. A bit more would be a pretty big edge.
  The remaining question is whether LLM-based systems will even achieve human-level intelligence. Steve thinks that probably won’t happen; see for instance his Foom & Doom. I think it probably will, and that might happen very soon.
  
  The issue is that nobody is sure how things are going to go. Taking a guess and going with it really isn’t a smart way to deal with a situation that could be deadly dangerous. I’m sure you’re seeing pessimists do that; optimists do too. Our overall response should be a careful weighing of pessimist and optimist positions.
  I’ve been trying to do that, and I’ve reached a disturbing conclusion: nobody has much clue. This inclines me toward caution, because the deeper arguments in both directions are both quite strong.
  - xpym 15 Dec 2025 13:26 UTC
    3 points
    0
    Parent
    
    Even humans have pretty much succeeded at taking over the world.
    
    Coalitions of humans have. It’s plausible that a slightly smarter in relevant ways AI might soon end up heading one, but I don’t expect it to get away with acting egregiously misaligned.
    
    The issue is that nobody is sure how things are going to go.
    
    Well, they aren’t behaving accordingly. Pessimists are super doomy, optimists expect “loving grace” around the corner, and neither side is at all discomfited by the vast gulf of confident disagreement in between.
    
    This inclines me toward caution
    
    A widely agreeable notion, surely, until elaborated on.