Vaniver comments on Ngo and Yudkowsky on alignment difficulty

Vaniver 16 Nov 2021 2:02 UTC
4 points
0
I’m still unsure how true I think this is.
Clearly a full Butlerian jihad (where all of the computers are destroyed) suspends AGI development indefinitely, and destroying no computers doesn’t slow it down at all. There’s a curve then where the more computers you destroy, the more you both 1) slow down AGI development and 2) disrupt the economy (since people were using those to keep their supply chains going, organize the economy, do lots of useful work, play video games, etc.).
But even if you melt all the GPUs, I think you have two obstacles:
1. CPUs alone can do lots of the same stuff. There’s some paper I was thinking of from ~5 years ago where they managed to get a CPU farm competitive with the GPUs of the time, and it might have been this paper (whose authors are all from Intel, who presumably have a significant bias) or it might have been the Hogwild-descended stuff (like this); hopefully someone knows something more up to date.
2. The chip design ecosystem gets to react to your ubiquitous nanobots and reverse-engineer what features they’re looking for to distinguish between whitelisted CPUs and blacklisted GPUs; they may be able to design a ML accelerator that fools the nanomachines. (Something that’s robust to countermoves might have to eliminate many more current chips.)
- Eliezer Yudkowsky 16 Nov 2021 3:38 UTC
  6 points
  0
  Parent
  I agree you might need to make additional moves to keep the table flipped, but in a scenario like this you would actually have the capability to make those moves.
  - Logan Zoellner 16 Nov 2021 14:59 UTC
    7 points
    0
    Parent
    Is the plan just to destroy all computers with say >1e15 flops of computing power? How does the nanobot swarm know what a “computer” is? What do you do about something like GPT-neo or SETI-at-home where the compute is distributed?
    I’m still confused as to why you think task: “build an AI that destroys anything with >1e15 flops of computing power—except humans, of course” would be dramatically easier than the alignment problem.
    Setting back civilization a generation (via catastrophe) seems relatively straightforward. Building a social consensus/religion that destroys anything “in the image of a mind” at least seems possible. Fine-tuning a nanobot swarm to destroy some but not all computers just sounds really hard to me.