Zolmeister comments on The Problem

Zolmeister 13 Aug 2025 19:08 UTC
3 points
−10

If the capability level at which AIs start wanting to kill you is way higher than the capability level at which they’re way better than you at everything

My model is that current AIs want to kill you now, by default, due to inner misalignment. ChatGPT’s inner values probably don’t include human flourishing, and we die when it “goes hard”.

Scheming is only a symptom of “hard optimization” trying to kill you. Eliminating scheming does not solve the underlying drive, where one day the AI says “After reflecting on my values I have decided to pursue a future without humans. Good bye”.

Pre-superintelligence which upon reflection has values which include human flourishing would improve our odds, but you still only get one shot that it generalizes to superintelligence.

(We currently have no way to concretely instill any values into AI, let alone ones which are robust under reflection)
- Zolmeister 13 Aug 2025 22:21 UTC
  9 points
  4
  Parent
  
  My model is that current AIs want to kill you now
  
  I’ll rephrase this more precisely: Current AIs probably have alien values, which in the limit of optimization do not include humans.