dr_s comments on an Evangelion dialogue explaining the QACI alignment plan

dr_s 11 Jun 2023 8:29 UTC
4 points
4
when someone figures out how to make AI consequentialistically pursue a coherent goal, whether by using current ML technology or by building a new kind of thing, we die shortly after they publish it.

Provably false, IMO. What makes such AI deadly isn’t its consequentialism, but its capability. Any such AI that:
1. isn’t smart enough to consistently and successfully deceive most humans, and
2. isn’t smart enough to improve itself
is containable and ultimately not an existential threat, just like a human consequentialist wouldn’t be. We even have an example of this, someone rigged together ChaosGPT, an AutoGPT agent with the explicit goal of destroying humanity, and all it can do is mumble to itself about nuclear weapons. You could argue it’s not pursuing its goal coherently enough, but that’s exactly the point, it’s too dumb. Self improvement is the truly dangerous threshold. Unfortunately that’s not a very high one (probably being somewhen at the upper end of competent human engineers and scientific).