Kaj_Sotala comments on Daniel Kokotajlo’s Shortform

Kaj_Sotala 10 Jul 2025 14:43 UTC
2 points
0
Sorry, I don’t understand. If “reward is the optimization target” incorrectly implies that AIs would change their behavior more than they do, then the drug addict example seems orthogonal to that issue?
- Daniel Kokotajlo 10 Jul 2025 19:04 UTC
  4 points
  0
  Parent
  I didn’t say reward is the optimization target NOW! I said it might be in the future! See the other chain/thread with Violet Hour.
  - Kaj_Sotala 10 Jul 2025 19:22 UTC
    2 points
    0
    Parent
    Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you’d updated from the behavior of current-day systems.
    - Daniel Kokotajlo 11 Jul 2025 0:37 UTC
      2 points
      0
      Parent
      I am talking about AIs similar to current-day systems, for some notion of “similar” at least. But I’m imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.