nshepperd comments on Varieties Of Doom

nshepperd 19 Nov 2025 4:51 UTC
9 points
0

>>> conscious * each_other * care_other * bored * avoid_wireheading * active_learning

0.4618944000000001

So, basically a coinflip that we will intuitively recognize hypothetical AI successors as valuable, but we did not enumerate literally all the things so the real odds are going to be at least somewhat lower than that. Let’s say 1⁄3, which is still high enough to not really change my mind on the button one vs. button two question. Though we’re not actually done yet, because we can invert these probabilities to get my estimate for the classic paperclip scenario where nothing of value is retained:

>>> (1 - conscious) * (1 - each_other) * (1 - care_other) * (1 - bored) * (1 - avoid_wireheading) * (1 - active_learning)

3.999999999999999e-07

This is wrong. Your care_other is an estimate of P(care others | conscious, fun, others exist, ...). The proper value to multiply here is P(~care others | ~conscious, ~fun, ~others exist, ...), which is not the same as 1-care_other (in fact, the correct value is clearly 1, since AIs could not care about each other if others do not exist).