If you, the reader, or, say, Paul Christiano or Eliezer gets uploaded and obtains self-improvement, self-modification and processing speed/power capabilities, will your goals converge to damaging humanity as well? If not, what makes it different? How can we transfer this secret sauce to an AI agent?
The Orthogonality Thesis states that values and capabilities can vary independently. The key question then is whether my/Paul’s/Eliezer’s values are actually as aligned with humanity as they appear to be, or if instead we are already unaligned and would perform a Treacherous Turn once we had the power to get away with it. There are certainly people who are already obviously bad choices, and people who would perform the Treacherous Turn (possibly most people[1]), but I believe there are people who are sufficiently aligned, so let’s assume going forward we’ve picked one of those. At this point “If not, what makes it different?” answers itself: by assumption we’ve picked a person for whom the Value Loading Problem is already solved. But we have no idea how to “transfer this secret sauce to an AI agent”—the secret sauce is hidden somewhere along this person’s particular upbringing and more importantly their multi-billion year evolutionary history.
The adage “power tends to corrupt, and absolute power corrupts absolutely” basically says that treacherous turns are commonplace for humans—we claim to be aligned and might even believe it ourselves while we are weak, but then when we get power we abuse it. This adage existing does not of course mean it’s universally true. ↩︎
I do not of course know your intentions, but this comment really rubbed me the wrong way:
Most importantly, there’s the everybody-knows dynamic (which, unrelatedly, Zvi has written about). Something that you happen to know is usually not as common knowledge as you think, and even if this case actually is mostly common knowledge, you could probably have found a way to write it that sounds nicer (i.e. less of a you’re-an-ignorant-outsider vibe) and/or better supported (got any stats/links showing how common this kind of scamming really is?)
Less importantly, the ‘modern USA’ phrasing feels to me like it’s taking a dig at something, like (a less extreme version of) whichever of the following feels most unfair to you: “of course this kind of scamming is common—welcome to capitalism”, or “of course this kind of scamming is common—welcome to Biden’s USA”.