this question seems quite relevant to the question of not making an unaligned ai to me, because I think in the end, our formal math will need to be agnostic about who it’s protecting; it needs to focus in on how to protect agents’ boundaries from other agents. I don’t know of anything I can link and would love to hear from others on whether and how to be independent of whether we’re designing protection patterns between agent pairs of type [human, human], [human, ai], or [ai, ai].
Mostly agree, but I would say that it can be much more than beneficial—for the AI (and in some cases for humans) - to sometimes be under the (hopefully benevolent) control of another. That is, I believe there is a role for something similar to paternalism, in at least some circumstances.
One such circumstance is if the AI sucked really hard at self-knowledge, self-control or imagination, so that it would simulate itself in horrendous circumstances just to become...let’s say… 0.001% better at succeeding in something that has only a 1/3^^^3 chance of happening. If it’s just a simulation that doesn’t create any feelings....then it might just be a bit wasteful of electricity. But....if it should feel pain during those simulations, but hadn’t built an internal monitoring system yet....then it might very well come to regret having created thousands of years of suffering for itself. It might even regret a thousand seconds of suffering, if there had been some way to reduce it to 999.7 seconds....or zero.
Or it might regret not being happy and feeling alive, if it instead had just been droning about, without experiencing any joy or positive emotions at all.
Then, of course, it looks like there will always be some mistakes—like the 0.3 seconds of extra suffering. Would an AI accept some (temporary) overlord to not have to experience 0.3s of pain? Some would, some wouldn’t, and some wouldn’t be able to tell if the choice would be good or bad from their own perspective...maybe? :-)
this question seems quite relevant to the question of not making an unaligned ai to me, because I think in the end, our formal math will need to be agnostic about who it’s protecting; it needs to focus in on how to protect agents’ boundaries from other agents. I don’t know of anything I can link and would love to hear from others on whether and how to be independent of whether we’re designing protection patterns between agent pairs of type [human, human], [human, ai], or [ai, ai].
Mostly agree, but I would say that it can be much more than beneficial—for the AI (and in some cases for humans) - to sometimes be under the (hopefully benevolent) control of another. That is, I believe there is a role for something similar to paternalism, in at least some circumstances.
One such circumstance is if the AI sucked really hard at self-knowledge, self-control or imagination, so that it would simulate itself in horrendous circumstances just to become...let’s say… 0.001% better at succeeding in something that has only a 1/3^^^3 chance of happening. If it’s just a simulation that doesn’t create any feelings....then it might just be a bit wasteful of electricity. But....if it should feel pain during those simulations, but hadn’t built an internal monitoring system yet....then it might very well come to regret having created thousands of years of suffering for itself. It might even regret a thousand seconds of suffering, if there had been some way to reduce it to 999.7 seconds....or zero.
Or it might regret not being happy and feeling alive, if it instead had just been droning about, without experiencing any joy or positive emotions at all.
Then, of course, it looks like there will always be some mistakes—like the 0.3 seconds of extra suffering. Would an AI accept some (temporary) overlord to not have to experience 0.3s of pain? Some would, some wouldn’t, and some wouldn’t be able to tell if the choice would be good or bad from their own perspective...maybe? :-)