He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future?
I didn’t watch the video so I might be missing something, but assuming you created an AI with that utility function, whether the cauldron is full right now and the probability that the cauldron will be full in the future are different utility functions. A sufficiently intelligent AI would know that maximizing its utility function now will hurt it later, but it doesn’t care because that’s not the utility function.
Eliezer has a bunch of different arguments because he’s trying to address different levels of familiarity with the problem. My impression is that he expects a sufficiently intelligent but unaligned AI to not be greedy like this and to scheme until it no longer needs us.
I didn’t watch the video so I might be missing something, but assuming you created an AI with that utility function, whether the cauldron is full right now and the probability that the cauldron will be full in the future are different utility functions. A sufficiently intelligent AI would know that maximizing its utility function now will hurt it later, but it doesn’t care because that’s not the utility function.
Eliezer has a bunch of different arguments because he’s trying to address different levels of familiarity with the problem. My impression is that he expects a sufficiently intelligent but unaligned AI to not be greedy like this and to scheme until it no longer needs us.