Random Developer comments on Alignment as uploading with more steps

Random Developer 14 Sep 2025 11:50 UTC
15 points
8
I have two concerns.

Dangerously poor alignment of individual humans. My concern with this plan is that some humans are very poorly aligned to each other. And even if you could “upload” these people and get an AI that was flawlessly aligned to their values, you’d still have a dangerously rogue intelligence on the loose.

Some examples:
- People joke about CEOs being high in Dark Triad traits. I met one who was charming, good with people, and almost completely amoral. Think Anthony Hopkins as Hannibal Lector, but without the cannibalism (I assume). He appeared to place zero moral value on other people. He is one of the creepiest people I’ve ever met, once I saw through the mask. He had this effect on a lot of people.
- I occasionally volunteer for a political party. Most of their elected officials are ordinary, well-meaning people. At least one of them is a notoriously manipulative user who shouldn’t be allowed near power and who should be avoided on a personal level.
- I could name any number of billionaires and politicians who are either slipping out of touch with consensus reality in strange ways, or unrepentantingly willing to lie and use people to get more power.
- Then there are any number of otherwise decent people whose highest moral values include controlling other people’s behavior very strictly. For example, for some of my distant ancestors, it wasn’t enough to be free to worship a God of their choice. They had that, and they left. What they wanted was to build communities where nobody was allowed to disagree, under threat of government force.
Even if you could perfectly align an AI around any of these people’s values, I would still consider it an existential risk on the same level as (say) SkyNet. In the case of my religious ancestors, the risks might be worse than mere extinction. Some of those people might have willingly employed cognitive control strategies that I would consider a fate considerably worse than death. And there have been a few historic preachers who were suspiciously gleeful about the existence of Hell. Someone out there, there is at least one human who would lovingly recreate Hell and start damning people to it, if they had the power.

Competitive pressures forcing a leap from human-aligned AGI to essentially alien ASI. Let’s assume that we actually solve “faithful” uploading, and we somehow ban uploading any rich and powerful sociopaths.

Now let’s imagine that Corporation/Government A uses only uploaded humans. Corporation/Government B, however, is willing to build custom minds from the ground up, giving them a working memory with a million items (instead of 7±2), the ability to fork and reintegrate sub-personas, the ability to do advanced math intuitively, the ability to think at super-human speeds, the ability to one-shot complex software with minimal planning (and use output formats that rely on and integrate directly into the million-item working memory), and a hundred other tweaks I’m not smart enough to imagine. They willingly choose to “break compatibility” with human neural architectures in ways that fundamentally change the minds they’re building, in order to get minds that even von Neumann or Feynman would agree are so smart that they’re a bit creepy.

If Corporation/Government A limits themselves to human uploads, and Corporation/Government B is willing to sacrifice all “human compatibility” to maximize intelligence, who wins?
- Cole Wyeth 14 Sep 2025 13:26 UTC
  2 points
  −3
  Parent
  The first concern seems like a much smaller risk that the one we currently face from unaligned AI. To be clear, I’m suggesting emulations of a relatively large number of people (more than 10, at least once the technology has been well tested, and eventually perhaps everyone). If some of them turn out be evil sociopaths, the others will just have to band together and enforce norms, exactly like we do now.
  The second concern sounds like gradual disempowerment to me. However, I think there are a lot of ways for Corporation A to win. Perhaps Corporation B is regulated out of existence—reckless modifications should violate some sort of human alignment code. Perharps we learn how to recursively self improve as emulations, in such a way that the alignment tax is near 0, and then just ensure that initial conditions modestly favor Corporation A (most companies adopt reasonable standards, and over time control most of the resources). Or perhaps corporate power is drastically reduced and emulations are able to coordinate once their intelligence is sufficiently boosted. Or perhaps a small team of early emulations performs a pivotal act. Basically, I think this is something our emulations can figure out.