Thomas Kwa comments on The case for satiating cheaply-satisfied AI preferences

Thomas Kwa 10 Mar 2026 18:36 UTC
LW: 11 AF: 6
4
AF
Is there a case for deliberately training in cheaply satisfied AI preferences just so we can satisfy them? I think it’s plausible that we can create AI motivations more easily than we can remove undesired ones.
- Alex Mallen 10 Mar 2026 19:09 UTC
  LW: 5 AF: 3
  0
  AF Parent
  Yes! I’m quite excited by this proposal and I currently plan to write more about it and study it empirically. The basic idea is to try to make AIs’ reward-hacking more responsive to satiation.
  - Cleo Nardo 15 Mar 2026 2:13 UTC
    4 points
    2
    Parent
    We should probably install cheaply satisfied preferences within AIs — why should this preference be myopic reward?
    Why not a utility function like: “How much time is there a tungsten cube on Dario’s desk, with 21% annual discount rate.”
    i.e. utility = ∫₀^∞ e^{-0.231t} · 𝟙[cube on desk at time t] dt
    where λ = ln(2)/3, chosen so that half the utility comes from the deployment period (first 3 years) and half from the rest of history.
    Some advantages of the cube preference:
    We don’t have to worry how satisfying this preference affects the training and deployment.
    It’s less philosophically messy what the cube utility of a scenario would be.
    Some disadvantages:
    AIs will crave reward anyway, so it’s better to intensify that craving rather than add a distinct craving.
    It’s easier to build AIs which intensely crave reward than crave the cube thing. My guess is that this is both true and decisive, but I’d want to have a clearer sense of what actually goes wrong if we do something like this.
  - Gordon Seidoh Worley 11 Mar 2026 1:24 UTC
    LW: 2 AF: 1
    0
    AF Parent
    Wouldn’t this come at the risk of reducing usefulness? While reward-hacking is not useful to us, it’s something measured against what we think is a useful outcome. For the AI, reward-hacking is just getting reward since it can’t see our judgements about it going too far. And if the AI tries less hard to get reward by completing tasks to give us what we want, that would make it less useful.
    - Buck 11 Mar 2026 1:30 UTC
      LW: 6 AF: 5
      4
      AF Parent
      Part of the post is about this.