StanislavKrym comments on David James’s Shortform

StanislavKrym 16 Oct 2025 19:44 UTC
0 points
−13
I think that Elieser means that mildly misaligned AIs are also highly unlikely, not that a mildly misalinged AI would also kill everyone:
When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1” is an overly large ask that we are not on course to get. So far as I’m concerned, if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I’ll take it.
As for LLMs being aligned by default, I don’t have even the slightest idea on how Ezra even came up with this. GPT-4o has already been a super-sycophant^[1] and driven people into psychosis in spite of OpenAI prohibiting it by their Spec. Grok’s alignment was so fragile that xAI’s mistake caused Grok to become MechaHitler.
1. ^
  In defense of 4o, it was raised on human feedback which is biased towards sycophancy and demands erotic sycophants (c) Zvi. But why would 4o drive people into a trance or psychosis?