Wei Dai comments on Alignment as uploading with more steps

Wei Dai 18 Sep 2025 6:01 UTC
6 points
0
Thanks for the suggested readings.

I’m trying not to die here.

There are lots of ways to cash out “trying not to die”, many of which imply that solving AI alignment (or getting uploaded) isn’t even the most important thing. For instance under theories of modal or quantum immortality, dying is actually impossible. Or consider that most copies of you in the multiverse or universe are probably living in simulations of Earth rather than original physical entities, so the most important thing from a survival-defined-indexically perspective may be to figure out what the simulators want, or what’s least likely to cause them to want to turn off the simulation or most likely to “rescue” you after you die here. Or, why aim for a “perfectly aligned” AI instead of one that cares just enough about humans to keep us alive in a comfortable zoo after the Singularity (which they may already do by default because of acausal trade, or maybe the best way to ensure this is to increase the cosmic resources available to aligned AI so they can do more of this kind of trade)?

And because I don’t believe in “correct” values.

The above was in part trying to point out that even something like not wanting to die is very ill defined, so if there are no correct values, not even relative to a person or a set of initial fuzzy non-preferences, then that’s actually a much more troubling situation then you seem to think.

I don’t know how to build a safe philosophically super-competent assistant/oracle

That’s in part why I’d want to attempt this only after a long pause (i.e. at least multi decades) to develop the necessary ideas, and probably only after enhancing human intelligence.
- Cole Wyeth 18 Sep 2025 14:03 UTC
  2 points
  0
  Parent
  To be clear, I’m trying to prevent AGI from killing everyone on earth, including but not limited to me personally.
  There could be some reason (which I don’t fully understand and can’t prove) for subjective immortality, but that poorly understood possibility does not cause me to drive recklessly or stop caring about other X-risks. I suspect that any complications fail to change the basic logic that I don’t want myself or the rest of humanity to be placed in mortal danger, whether or not that danger subjectively results in death—it seems very likely to result in a loss of control.
  A long pause with intelligence enhancement sounds great. I don’t think we can achieve a very long pause, because the governance requirements become increasingly demanding as compute gets cheaper. I view my emulation scheme as closely connected to intelligence enhancement—for instance, if you ran the emulation for only twenty seconds you could use it as a biofeedback mechanism to avoid bad reasoning steps by near-instantly predicting they would soon be regretted (as long as this target grounds out properly, which takes work).