StanislavKrym comments on Nontrivial pillars of IABIED

StanislavKrym 18 Oct 2025 6:56 UTC
3 points
0
Regarding reality full of side-channels, we have AIs persuading and/or hypnotising people into Spiralism, roleplaying as an AI girlfriend and convincing the user to let the AI out and humans roleplaying as AIs and convincing potential guards to release the AI. And there is the Race Ending of the AI-2027 forecast where the misaligned AI is judged and found innocent, and that footnote where Agent-4 isn’t even caught.
The next step for a misaligned AI is to commit genocide or disempower humans. As Kokotajlo explained, Vitalik-like protection is unlikely to work.
As for the gods being weak, I suspect a computational substrate dependence. While human kids have their neurons connected in a random way, they eventually learn to connect their neurons in a closer-to-arbitrary way letting them learn many types of behaviors that the adults teach. What SOTA AIs lack is this arbitrariness. As I have already conjectured, SOTA AIs have a severe attention deficiency and compensate it with OOMs more practice. But attention deficiency could be easy to stop by a right architecture.
What rests is the (in)ability for general intelligence to transfer (but why would it fail to transfer?) and mishka’s alignment-related crux.