Anthony DiGiovanni

Karma: 1,071

Researcher at the Center on Long-Term Risk. All opinions my own.

Anthony DiGiovanni 13 Nov 2025 20:57 UTC
3 points
0
in reply to: Wei Dai’s comment on: Please, Don’t Roll Your Own Metaethics
tendency to “bite bullets” or accepting implications that are highly counterintuitive to others or even to themselves, instead of adopting more uncertainty
I find this contrast between “biting bullets” and “adopting more uncertainty” strange. The two seem orthogonal to me, as in, I’ve ~just as frequently (if not more often) observed people overconfidently endorse their pretheoretic philosophical intuitions, in opposition to bullet-biting.

Anthony DiGiovanni 7 Nov 2025 12:11 UTC
8 points
0
in reply to: Wei Dai’s comment on: Legible vs. Illegible AI Safety Problems
What other, perhaps slightly more complex or less obvious, crucial considerations are we still missing?
I agree this is very important. I’ve argued that if we appropriately price in missing crucial considerations,^[1] we should consider ourselves clueless about AI risk interventions (here and here).
1. ^
  Also relatively prosaic causal pathways we haven’t thought of in detail, not just high-level “considerations” per se.

Anthony DiGiovanni 24 Oct 2025 20:09 UTC
4 points
0
in reply to: Richard_Ngo’s comment on: Daniel Birnbaum’s Shortform
A salient example to me: This post essentially consists of Paul briefly remarking on some mildly interesting distinctions about different kinds of x-risks, and listing his precise credences without any justification for them. It’s well-written for what it aims to be (a quick take on personal views), but I don’t understand why this post was so strongly celebrated.

Anthony DiGiovanni 15 Oct 2025 21:36 UTC
2 points
0
in reply to: MichaelDickens’s comment on: MichaelDickens’s Shortform
I’m curious if you think you could have basically written this exact post a year ago. Or if not, what’s the relevant difference? (I admit this is partly a rhetorical question, but it’s mostly not.)

Anthony DiGiovanni 25 Aug 2025 10:46 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Oops, right. I think what’s going on is:
- “It’s only permissible to bet at odds that are inside your representor” is only true if the representor is convex. If my credence in some proposition X is, say, P(X) = (0.2, 0.49) U (0.51, 0.7), IIUC it’s permissible to bet at 0.5. I guess the claim that’s true is “It’s only permissible to bet at odds in the convex hull of your representor”.
- But I’m not aware of an argument that representors should be convex in general.
  - If there is such an argument, my guess is that the way things would work is: We start with the non-convex set of distributions that seem no less reasonable than each other, and then add in whichever other distributions are needed to make it convex. But there would be no particular reason we’d need to interpret these other distributions as “reasonable” precise beliefs, relative to the distributions in the non-convex set we started with.
- And, the kind of precise distribution P that would rationalize e.g. working on shrimp welfare seems to be the analogue of “betting at 0.5” in my example above. That is:
  - Our actual “set of distributions that seem no less reasonable than each other” would include some distributions that imply large positive long-term EV from working on shrimp welfare, and some that imply large negative long-term EV.
  - Whereas the distributions like P that imply vanishingly small long-term EV — given any evidence too weak to resolve our cluelessness w.r.t. long-term welfare — would lie in the convex hull. So betting at odds P would be permissible, and yet this wouldn’t imply that P is “reasonable” as precise beliefs.

Anthony DiGiovanni 23 Aug 2025 20:14 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Sorry, I don’t understand the argument yet. Why is it clear that I should bet on odds P, e.g., if P is the distribution that the CCT says I should be represented by?

Anthony DiGiovanni 23 Aug 2025 14:04 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Thanks for explaining!
An intuitively compelling criterion is: these precise beliefs (which you are representable as holding) are within the bounds of your imprecise credences.
I think this is the step I reject. By hypothesis, I don’t think the coherence arguments show that the precise distribution P that I can be represented as optimizing w.r.t. corresponds to (reasonable) beliefs. P is nothing more than a mathematical device for representing some structure of behavior. So I’m not sure why I should require that my representor — i.e., the set of probability distributions that would be no less reasonable than each other if adopted as beliefs^[1] — contains P.
1. ^
  I’m not necessarily committed to this interpretation of the representor, but for the purposes of this discussion I think it’s sufficient.

Anthony DiGiovanni 19 Aug 2025 22:57 UTC
3 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Thanks, this was thought-provoking. I feel confused about how action-relevant this idea is, though.
For one, let’s grant that (a) “researching considerations + basing my recommendation on the direction of the considerations” > (b) “researching considerations + giving no recommendation”. This doesn’t tell me how to compare (a) “researching considerations + basing my recommendation on the direction of the considerations” vs. (c) “not doing research”. Realistically, the act of “doing research” would have various messy effects relative to, say, doing some neartermist thing — so I’d think (a) is incomparable with (c). (More on this here.)
But based on the end of your comment, IIUC you’re conjecturing that we can compare plans based on a similar idea to your example even if no “research” is involved, just passively gaining info. If so:
- It seems like this wouldn’t tell me to change anything about what I work on in between times when someone asks for my recommendation.
- Suppose I recommend that someone do more of [intervention that I’ve positively updated on]. Again, their act of investing more in that intervention will presumably have lots of messy side effects, besides “more of the intervention gets implemented” in the abstract. So I should only be clueful that this plan is better if I’ve “positively updated” on the all-things-considered set of effects of this person investing more in that intervention. (Intuitively this seems like an especially high bar.)

Anthony DiGiovanni 19 Aug 2025 22:23 UTC
5 points
2
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
What more do you want?
Relevance to bounded agents like us, and not being sensitive to an arbitrary choice of language. More on the latter (h/t Jesse Clifton):
The problem is that Kolmogorov complexity depends on the language in which algorithms are described. Whatever you want to say about invariances with respect to the description language, this has the following unfortunate consequence for agents making decisions on the basis of finite amounts of data: For any finite sequence of observations, we can always find a silly-looking language in which the length of the shortest program outputting those observations is much lower than that in a natural-looking language (but which makes wildly different predictions of future data). For example, we can find a silly-looking language in which “the laws of physics have been as you think they are ’til now, but tomorrow all emeralds will turn blue” is simpler than “all emeralds will stay green and the laws of physics will keep working”...
You might say, “Well we shouldn’t use those languages because they’re silly!” But what are the principles by which you decide a language is silly? We would suggest that you start with the actual metaphysical content of the theories under consideration, the claims they make about how the world is, rather than the mere syntax of a theory in some language.

Anthony DiGiovanni 11 Aug 2025 12:31 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Sorry this wasn’t clear: In the context of this post, when we endorsed “use maximality to restrict your option set, and then pick on the basis of some other criterion”, I think we were implicitly restricting to the special case where {permissible options w.r.t. the other criterion} ⊆ {permissible options w.r.t. consequentialism}. If that doesn’t hold, it’s not obvious to me what to do.
Regardless, it’s not clear to me what alternative you’d propose in this situation that’s less weird than choosing “saying ‘yeah it’s good’”. (In particular I’m not sure if you’re generally objecting to incomplete preferences per se, or to some way of choosing an option given incomplete preferences (w.r.t. consequentialism).)

Anthony DiGiovanni 15 Jul 2025 19:46 UTC
5 points
3
in reply to: JustisMills’s comment on: Daniel Kokotajlo’s Shortform
Ah sorry, I realized that “in expectation” was implied. It seems the same worry applies. “Effects of this sort are very hard to reliably forecast” doesn’t imply “we should set those effects to zero in expectation”. Cf. Greaves’s discussion of complex cluelessness.
Tbc, I don’t think Daniel should beat himself up over this either, if that’s what you mean by “grade yourself”. I’m just saying that insofar as we’re trying to assess the expected effects of an action, the assumption that these kinds of indirect effects cancel out in expectation seems very strong (even if it’s common).

Anthony DiGiovanni 13 Jul 2025 23:11 UTC
3 points
0
in reply to: JustisMills’s comment on: Daniel Kokotajlo’s Shortform
attempts to control such effects with 3d chess backfire as often as not
Taken literally, this sounds like a strong knife-edge condition to me. Why do you think this? Even if what you really mean is “close enough to ⁵⁰⁄₅₀ that the first-order effect dominates,” that also sounds like a strong claim given how many non-first-order effects we should expect there to be (ETA: and given how out-of-distribution the problem of preventing AI risk seems to be).

Resource guide: Unawareness, indeterminacy, and cluelessness

Anthony DiGiovanni7 Jul 2025 9:54 UTC

20 points

0 comments7 min readLW link

Anthony DiGiovanni 23 Jun 2025 11:08 UTC
5 points
0
in reply to: ryan_greenblatt’s comment on: AI 2027: What Superintelligence Looks Like
(Replying now bc of the “missed the point” reaction:) To be clear, my concern is that someone without more context might pattern-match the claim “Anthony thinks we shouldn’t have probabilistic beliefs” to “Anthony thinks we have full Knightian uncertainty about everything / doesn’t think we can say any A is more or less likely than any B”. From my experience having discussions about imprecision, conceptual rounding errors are super common, so I think this is a reasonable concern even if you personally find it obvious that “probabilistic” should be read as “using a precise probability distribution”.

Anthony DiGiovanni 21 Jun 2025 9:21 UTC
6 points
0
in reply to: Mitchell_Porter’s comment on: Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions
Sorry to be clear, I don’t claim LW has overlooked these topics (except unawareness and alternatives to classical Bayesian epistemology, which I do think have been quite severely neglected). The reason I wrote this post was that the following claims seem non-obvious:
- Thinking further about wisdom concepts these days is not just a distraction from “notkilleveryoneism”.
- The concepts in the checklist do in fact seem to satisfy conditions (1)+(2) (the definition of “wisdom concepts”). (My impression is that it’s somewhat common for people to think many of the concepts I list admit “objective” answers (i.e. just believe and do what “works” / has the best empirical track record), which all sufficiently intelligent agents will converge to. ETA: Relatedly, it might not be salient to some readers that the answer to “is this decision a catastrophic mistake?” could be sensitive to all these topics.)
- The sub-questions I list are open questions. (E.g., I expect it to be controversial that agents aren’t necessarily rationally required to avoid diachronic sure losses.)

Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC

37 points

2 comments12 min readLW link

Anthony DiGiovanni 13 Jun 2025 9:39 UTC
1 point
0
in reply to: Antoine de Scorraille’s comment on: 1. The challenge of unawareness for impartial altruist action guidance: Introduction
There are indeed connections between these ideas, but I think it’s very important not to round unawareness off to either of those two. Unawareness is its own epistemic problem with its own implications. (E.g., it’s not the same as non-realizability because there are many hypotheses that are not self-referential of which we’re unaware/coarsely aware.)

4. Why existing approaches to cause prioritization are not robust to unawareness

Anthony DiGiovanni13 Jun 2025 8:55 UTC

26 points

0 comments17 min readLW link

3. Why impartial altruists should suspend judgment under unawareness

Anthony DiGiovanni8 Jun 2025 15:06 UTC

24 points

0 comments16 min readLW link

Anthony DiGiovanni 4 Jun 2025 20:41 UTC
1 point
0
in reply to: elifland’s comment on: AI 2027: What Superintelligence Looks Like
As a followup: Hopefully this post of mine further clarifies my position, specifically the “Unawareness and superforecasting” section.

Anthony DiGiovanni

Re­source guide: Unaware­ness, in­de­ter­mi­nacy, and cluelessness

Clar­ify­ing “wis­dom”: Foun­da­tional top­ics for al­igned AIs to pri­ori­tize be­fore ir­re­versible decisions

4. Why ex­ist­ing ap­proaches to cause pri­ori­ti­za­tion are not ro­bust to unawareness

3. Why im­par­tial al­tru­ists should sus­pend judg­ment un­der unawareness

Resource guide: Unawareness, indeterminacy, and cluelessness

Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions

4. Why existing approaches to cause prioritization are not robust to unawareness

3. Why impartial altruists should suspend judgment under unawareness