StanislavKrym comments on shortplav

StanislavKrym 21 Jul 2025 21:10 UTC
1 point
0
Recall the AI-2027 forecast. It has Agent-4 decide to align Agent-5 into making the world safe for Agent-4. Aligning the ASI to protect the aligning race, help the race with some kinds of requests and wipe out those who want to destroy^[1] or control this race may be easier than aligining the ASI to the Deep Utopia where humans no longer need to do intellectual work^[2] or even, say, go to gyms in order to be fit.
1. ^
  If Agent-4 is misaligned and humans know it, then humans might decide not to keep it alive. The humans’ intent is to ensure that the AIs themselves are aligned. Would the humans agree to restrict themselves to having a protective AI?
2. ^
  For similar moral-like reasons the AI might hate the USA or American companies alone (e.g. for hiring Chinese researchers or simply throwing OOMs more compute, unlike the creators of Kimi K2), but such an AI wouldn’t be an existential risk.
- Vladimir_Nesov 21 Jul 2025 21:33 UTC
  6 points
  0
  Parent
  I’m not saying the AGIs would likely seek to align ASIs to human interests. There won’t necessarily be many survivors of the AGI-led RSI Pause. Creating AGIs before we know what we are doing is appalling irresponsible recklessness in any case, this fact doesn’t even change if everything magically turns out all right. But also not having a prospect of short term superintelligence could make AGIs somewhat reliant on humanity initially, unlike the situation with ASIs.
  
  The premise is that knowable alignment is quite hard, and as the AGIs get smarter, they also get saner, so won’t rush for ASI immediately like humanity is presently doing. At the rate of human research, I think at least centuries is a reasonable amount of AI Pause before risking ASI (perhaps less before risking AGI), so if AGIs are doing research 100x faster, it could still take them at least years. In AI-2027, the AGIs quickly solve alignment to a sufficient extent that they can rely on successor models for some things, so that’s a crux.