Abe Dillon comments on Abe Dillon’s Shortform

Abe Dillon 27 Apr 2026 17:26 UTC
−3 points
−2
Arbitrary Intelligence does not imply a maximally greedy algorithm.

A greedy algorithm is one that always makes decisions based on what yields the most immediate benefit. It essentially eschews the concept of delayed gratification.

In discussions about value alignment, where the assumption is often that the system to be ‘aligned’ possesses arbitrary intelligence, I often see people assume that arbitrary intelligence can be conceived of as a maximally greedy algorithm.

When someone proposes a specific goal, detractors often conjure a scenario in which a greedy optimizer with that goal would misbehave, then dismiss the proposal without further consideration. Instead, we should expect an arbitrarily intelligent agent to employ far more sophisticated decision criteria.

Eliezer Yudkowsky makes this assumption many times in this lecture. He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future? Surely an arbitrarily intelligent agent would consider conserving water to ensure future utility, right? Wouldn’t securing Earth’s water supply put the agent into conflict with many other agents in the environment? There’s at least some probability that the other agents will destroy our cauldron filler, leading to zero future utility, right? The list of reasons why an arbitrarily intelligent agent might not want to grab up all the water in the observable universe and pour it into the cauldron is endless.

You might say, “Sure, but you can’t guarantee that the list of reasons an agent might not do something dangerous will overcome the list of reasons it might. This is a discussion about safety after all. Yudkowsky’s job is to show how things could go horribly wrong, not that it will. A detractor would have to prove the system will be safe, not that it might. Where safety is concerned, the onus of proof lies with the claimant that something is safe. It’s enough to show that it might not be.” I would largely agree with this sentiment, but it becomes a form of Pascal’s mugging when taken to the extreme.

Like the state of the cauldron, we can only reason about the safety of a system in terms of probability. We may want to maximize the probability of a safe outcome and minimize the probability of a dangerous outcome, but in the real world, the mathematical certainty of proofs simply doesn’t apply. An arbitrarily intelligent system may dream up plans that we could never conceive of, which complicates our ability to weigh the probability of negative, positive, and neutral side effects, but the contrarian instinct to think of the worst possible outcome and disregard the absurdity of how that outcome might come about doesn’t serve us well.

When I brew my coffee in the morning, there’s always a non-zero chance that a hypnotist tricked me into thinking I’m grinding coffee beans when I am, in fact, launching a nuclear missile that will start WW III and lead to the extinction of humanity. There’s a chance that I will somehow anger an arbitrarily powerful being who will strike me with a lightning bolt for some reason that’s ultimately unfathomable to my tiny human brain. There are infinite possibilities for infinite tragedy. There has got to be a more rational way to estimate risk.
- Brendan Long 27 Apr 2026 18:06 UTC
  2 points
  0
  Parent
  He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future?
  I didn’t watch the video so I might be missing something, but assuming you created an AI with that utility function, whether the cauldron is full right now and the probability that the cauldron will be full in the future are different utility functions. A sufficiently intelligent AI would know that maximizing its utility function now will hurt it later, but it doesn’t care because that’s not the utility function.
  Eliezer has a bunch of different arguments because he’s trying to address different levels of familiarity with the problem. My impression is that he expects a sufficiently intelligent but unaligned AI to not be greedy like this and to scheme until it no longer needs us.