Anthony DiGiovanni comments on Daniel Kokotajlo’s Shortform

Anthony DiGiovanni 13 Jul 2025 23:11 UTC
3 points
0
attempts to control such effects with 3d chess backfire as often as not
Taken literally, this sounds like a strong knife-edge condition to me. Why do you think this? Even if what you really mean is “close enough to ⁵⁰⁄₅₀ that the first-order effect dominates,” that also sounds like a strong claim given how many non-first-order effects we should expect there to be (ETA: and given how out-of-distribution the problem of preventing AI risk seems to be).
- JustisMills 14 Jul 2025 14:40 UTC
  4 points
  2
  Parent
  I guess I was imagining an implied “in expectation”, like predictions about second order effects of a certain degree of speculativeness are inaccurate enough that they’re basically useless, and so shouldn’t shift the expected value of an action. There are definitely exceptions and it’d depend how you formulate it, but “maybe my action was relevant to an emergent social phenomenon containing many other people with their own agency, and that phenomenon might be bad for abstract reasons, but it’s too soon to tell” just feels like… you couldn’t have anticipated that without being superhuman at forecasting, so you shouldn’t grade yourself on the basis of it happening (at least for the purposes of deciding how to motivate future behavior).
  - Anthony DiGiovanni 15 Jul 2025 19:46 UTC
    5 points
    3
    Parent
    Ah sorry, I realized that “in expectation” was implied. It seems the same worry applies. “Effects of this sort are very hard to reliably forecast” doesn’t imply “we should set those effects to zero in expectation”. Cf. Greaves’s discussion of complex cluelessness.
    Tbc, I don’t think Daniel should beat himself up over this either, if that’s what you mean by “grade yourself”. I’m just saying that insofar as we’re trying to assess the expected effects of an action, the assumption that these kinds of indirect effects cancel out in expectation seems very strong (even if it’s common).