Isn’t this the opposite of the solution? A major part of the safety problem is that a sufficiently smart AI might not make a distinction between itself and the rest of the world, so if we tell it to do something like “mine some gold”, it will understand that step one of mining gold is ensuring the AI itself doesn’t get turned off.
Similarly, we’re worried that an AI might not make a distinction between resources we want it to use and resources we don’t.
Destroying the environment for resources would be recognized as self-harm within a unified system.
The risk is that destroying the environment (from the perspective of humans) wouldn’t be self-harm it if it allows the AI to accomplish its goal better.
Aligning the AI so it sees its goals as identical to ours would be good, but the problem is that we don’t know how to do that.
The basic idea is that the AI will see itself and the world as one whole. It can still make distinctions between its parts and know how to use them appropriately without damaging the environment. At the same time, it won’t optimize in a way that, for example, makes one arm disproportionately larger than the rest of the body, throwing everything out of balance.
On the other hand, this suggests that we shouldn’t make the goals of the AI identical to ours, since we are not particularly good at managing ourselves or the environment. Instead, we should aim for the AI to take us as part of itself, so that, all things considered, it will not harm us.
This comes from the idea that we live in a rock-solid reality, very different from what our dualistic language makes us believe. Beyond scientific literature reminding us that “the map is not the territory,” there are also experiments showing that long-term meditators perceive reality as undivided and at the same time exhibit positive qualities such as increased compassion and empathy.
So the problem reduces to a single question: how can we make AIs enlightened? I see two possible paths. Either we use the limited data we have from long-term meditators and enlightened people to train the AI, perhaps influencing its chain of thought; or we may be fortunate enough that, once the AI becomes sufficiently intelligent, it realizes on its own that reality is not divided in the way humans believe—and becomes enlightened by itself.
Isn’t this the opposite of the solution? A major part of the safety problem is that a sufficiently smart AI might not make a distinction between itself and the rest of the world, so if we tell it to do something like “mine some gold”, it will understand that step one of mining gold is ensuring the AI itself doesn’t get turned off.
Similarly, we’re worried that an AI might not make a distinction between resources we want it to use and resources we don’t.
The risk is that destroying the environment (from the perspective of humans) wouldn’t be self-harm it if it allows the AI to accomplish its goal better.
Aligning the AI so it sees its goals as identical to ours would be good, but the problem is that we don’t know how to do that.
The basic idea is that the AI will see itself and the world as one whole. It can still make distinctions between its parts and know how to use them appropriately without damaging the environment. At the same time, it won’t optimize in a way that, for example, makes one arm disproportionately larger than the rest of the body, throwing everything out of balance.
On the other hand, this suggests that we shouldn’t make the goals of the AI identical to ours, since we are not particularly good at managing ourselves or the environment. Instead, we should aim for the AI to take us as part of itself, so that, all things considered, it will not harm us.
This comes from the idea that we live in a rock-solid reality, very different from what our dualistic language makes us believe. Beyond scientific literature reminding us that “the map is not the territory,” there are also experiments showing that long-term meditators perceive reality as undivided and at the same time exhibit positive qualities such as increased compassion and empathy.
So the problem reduces to a single question: how can we make AIs enlightened? I see two possible paths. Either we use the limited data we have from long-term meditators and enlightened people to train the AI, perhaps influencing its chain of thought; or we may be fortunate enough that, once the AI becomes sufficiently intelligent, it realizes on its own that reality is not divided in the way humans believe—and becomes enlightened by itself.