Marco Bazzani

Karma: −2

Marco Bazzani 30 Sep 2025 3:05 UTC
−1 points
−5
in reply to: Eliezer Yudkowsky’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Here’s my attempted phrasing, which might be avoid some of the common confusions:

Suppose we have a model $M$ with utility function $ϕ$ , where $M$ is not capable of taking over the world. Assume that thanks to a bunch of alignment work, $ϕ$ is within $δ$ (by some metric) of humanity’s collective utility function. Then in the process of maximizing $ϕ$ , $M$ ends up doing a bunch of vaguely helpful stuff.

Then someone releases model $M^{'}$ with utility function $ϕ^{'}$ , where $M^{'}$ is capable of taking over the world. Suppose that our alignment techniques generalize perfectly. That is, $ϕ^{'}$ is also within $δ$ of humanity’s collective utility function. Then in the process of maximizing $ϕ^{'}$ , $M^{'}$ gets rid of humans and rearranges their molecules to satisfy $ϕ^{'}$ better.

Does this phrasing seem accurate and helpful?