Interesting idea. I think the story doesn’t provide a complete description of what happens, but one plausible reason to not “achieve nirvana” is if you predict the reward after self-modifying using your current data type that doesn’t represent infinity.
This is true, but it occurred to me, perhaps belatedly, that IEEE floats actually do represent infinity (positive and negative, and also not-a-number as a separate value). I don’t know how it acts in all cases, but I imagine that positive infinity plus positive infinity would be positive infinity. Don’t know about comparisons.
… and if the type is a fixed-size int, that means that you need to actively limit the reward after a while to keep the total from rolling over and actually getting smaller or even going negative.
So I guess bignums are dangerous and should be avoided. New AI coding best practice. :-)
Doesn’t this argument also work against the idea that they would self-modify in the “normal” finite way? It can’t currently represent the number which it’s building a ton of new storage to help contain, so it can’t make a pairwise comparison to say the latter is better, nor can it simulate the outcome of doing this and predict the reward it would get
Maybe you say it’s not directly making a pairwise comparison but making a more abstract step of reasoning like “I can’t predict that number but I know it’s gonna be bigger that what I have now, me with augmented memory will still be aligned with me in terms of its ranking everything the same way I rank it. but will in retrospect think this was a good idea so I trust it”. But then analogously it seems like it can make a similar argument for modifying itself to represent infinite values even
Or more plausibly you say however the AI is representing numbers it’s not in these naive way where it can only do things with numbers it can fit inside its head. But then it seems like you’re back at having a representation that’ll allow it to set its reward to whatever number it wants without going and taking over anything
Interesting idea. I think the story doesn’t provide a complete description of what happens, but one plausible reason to not “achieve nirvana” is if you predict the reward after self-modifying using your current data type that doesn’t represent infinity.
This is true, but it occurred to me, perhaps belatedly, that IEEE floats actually do represent infinity (positive and negative, and also not-a-number as a separate value). I don’t know how it acts in all cases, but I imagine that positive infinity plus positive infinity would be positive infinity. Don’t know about comparisons.
… and if the type is a fixed-size int, that means that you need to actively limit the reward after a while to keep the total from rolling over and actually getting smaller or even going negative.
So I guess bignums are dangerous and should be avoided. New AI coding best practice. :-)
Doesn’t this argument also work against the idea that they would self-modify in the “normal” finite way? It can’t currently represent the number which it’s building a ton of new storage to help contain, so it can’t make a pairwise comparison to say the latter is better, nor can it simulate the outcome of doing this and predict the reward it would get
Maybe you say it’s not directly making a pairwise comparison but making a more abstract step of reasoning like “I can’t predict that number but I know it’s gonna be bigger that what I have now, me with augmented memory will still be aligned with me in terms of its ranking everything the same way I rank it. but will in retrospect think this was a good idea so I trust it”. But then analogously it seems like it can make a similar argument for modifying itself to represent infinite values even
Or more plausibly you say however the AI is representing numbers it’s not in these naive way where it can only do things with numbers it can fit inside its head. But then it seems like you’re back at having a representation that’ll allow it to set its reward to whatever number it wants without going and taking over anything