Cleo Nardo comments on The case for satiating cheaply-satisfied AI preferences

Cleo Nardo 15 Mar 2026 2:13 UTC
4 points
2
We should probably install cheaply satisfied preferences within AIs — why should this preference be myopic reward?
Why not a utility function like: “How much time is there a tungsten cube on Dario’s desk, with 21% annual discount rate.”
i.e. utility = ∫₀^∞ e^{-0.231t} · 𝟙[cube on desk at time t] dt
where λ = ln(2)/3, chosen so that half the utility comes from the deployment period (first 3 years) and half from the rest of history.
Some advantages of the cube preference:
- We don’t have to worry how satisfying this preference affects the training and deployment.
- It’s less philosophically messy what the cube utility of a scenario would be.
Some disadvantages:
- AIs will crave reward anyway, so it’s better to intensify that craving rather than add a distinct craving.
- It’s easier to build AIs which intensely crave reward than crave the cube thing. My guess is that this is both true and decisive, but I’d want to have a clearer sense of what actually goes wrong if we do something like this.