Anthony DiGiovanni

Karma: 1,064

Researcher at the Center on Long-Term Risk. All opinions my own.

Anthony DiGiovanni 24 Oct 2025 20:09 UTC
4 points
0
in reply to: Richard_Ngo’s comment on: Daniel Birnbaum’s Shortform
A salient example to me: This post essentially consists of Paul briefly remarking on some mildly interesting distinctions about different kinds of x-risks, and listing his precise credences without any justification for them. It’s well-written for what it aims to be (a quick take on personal views), but I don’t understand why this post was so strongly celebrated.

Anthony DiGiovanni 15 Oct 2025 21:36 UTC
2 points
0
in reply to: MichaelDickens’s comment on: MichaelDickens’s Shortform
I’m curious if you think you could have basically written this exact post a year ago. Or if not, what’s the relevant difference? (I admit this is partly a rhetorical question, but it’s mostly not.)

Anthony DiGiovanni 25 Aug 2025 10:46 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Oops, right. I think what’s going on is:
- “It’s only permissible to bet at odds that are inside your representor” is only true if the representor is convex. If my credence in some proposition X is, say, P(X) = (0.2, 0.49) U (0.51, 0.7), IIUC it’s permissible to bet at 0.5. I guess the claim that’s true is “It’s only permissible to bet at odds in the convex hull of your representor”.
- But I’m not aware of an argument that representors should be convex in general.
  - If there is such an argument, my guess is that the way things would work is: We start with the non-convex set of distributions that seem no less reasonable than each other, and then add in whichever other distributions are needed to make it convex. But there would be no particular reason we’d need to interpret these other distributions as “reasonable” precise beliefs, relative to the distributions in the non-convex set we started with.
- And, the kind of precise distribution P that would rationalize e.g. working on shrimp welfare seems to be the analogue of “betting at 0.5” in my example above. That is:
  - Our actual “set of distributions that seem no less reasonable than each other” would include some distributions that imply large positive long-term EV from working on shrimp welfare, and some that imply large negative long-term EV.
  - Whereas the distributions like P that imply vanishingly small long-term EV — given any evidence too weak to resolve our cluelessness w.r.t. long-term welfare — would lie in the convex hull. So betting at odds P would be permissible, and yet this wouldn’t imply that P is “reasonable” as precise beliefs.

Anthony DiGiovanni 23 Aug 2025 20:14 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Sorry, I don’t understand the argument yet. Why is it clear that I should bet on odds P, e.g., if P is the distribution that the CCT says I should be represented by?

Anthony DiGiovanni 23 Aug 2025 14:04 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Thanks for explaining!
An intuitively compelling criterion is: these precise beliefs (which you are representable as holding) are within the bounds of your imprecise credences.
I think this is the step I reject. By hypothesis, I don’t think the coherence arguments show that the precise distribution P that I can be represented as optimizing w.r.t. corresponds to (reasonable) beliefs. P is nothing more than a mathematical device for representing some structure of behavior. So I’m not sure why I should require that my representor — i.e., the set of probability distributions that would be no less reasonable than each other if adopted as beliefs^[1] — contains P.
1. ^
  I’m not necessarily committed to this interpretation of the representor, but for the purposes of this discussion I think it’s sufficient.

Anthony DiGiovanni 19 Aug 2025 22:57 UTC
3 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Thanks, this was thought-provoking. I feel confused about how action-relevant this idea is, though.
For one, let’s grant that (a) “researching considerations + basing my recommendation on the direction of the considerations” > (b) “researching considerations + giving no recommendation”. This doesn’t tell me how to compare (a) “researching considerations + basing my recommendation on the direction of the considerations” vs. (c) “not doing research”. Realistically, the act of “doing research” would have various messy effects relative to, say, doing some neartermist thing — so I’d think (a) is incomparable with (c). (More on this here.)
But based on the end of your comment, IIUC you’re conjecturing that we can compare plans based on a similar idea to your example even if no “research” is involved, just passively gaining info. If so:
- It seems like this wouldn’t tell me to change anything about what I work on in between times when someone asks for my recommendation.
- Suppose I recommend that someone do more of [intervention that I’ve positively updated on]. Again, their act of investing more in that intervention will presumably have lots of messy side effects, besides “more of the intervention gets implemented” in the abstract. So I should only be clueful that this plan is better if I’ve “positively updated” on the all-things-considered set of effects of this person investing more in that intervention. (Intuitively this seems like an especially high bar.)

Anthony DiGiovanni 19 Aug 2025 22:23 UTC
5 points
2
in reply to: Leon Lang’s comment on: Leon Lang’s Shortform
What more do you want?
Relevance to bounded agents like us, and not being sensitive to an arbitrary choice of language. More on the latter (h/t Jesse Clifton):
The problem is that Kolmogorov complexity depends on the language in which algorithms are described. Whatever you want to say about invariances with respect to the description language, this has the following unfortunate consequence for agents making decisions on the basis of finite amounts of data: For any finite sequence of observations, we can always find a silly-looking language in which the length of the shortest program outputting those observations is much lower than that in a natural-looking language (but which makes wildly different predictions of future data). For example, we can find a silly-looking language in which “the laws of physics have been as you think they are ’til now, but tomorrow all emeralds will turn blue” is simpler than “all emeralds will stay green and the laws of physics will keep working”...
You might say, “Well we shouldn’t use those languages because they’re silly!” But what are the principles by which you decide a language is silly? We would suggest that you start with the actual metaphysical content of the theories under consideration, the claims they make about how the world is, rather than the mere syntax of a theory in some language.

Anthony DiGiovanni 11 Aug 2025 12:31 UTC
2 points
0
in reply to: Lukas Finnveden’s comment on: Winning isn’t enough
Sorry this wasn’t clear: In the context of this post, when we endorsed “use maximality to restrict your option set, and then pick on the basis of some other criterion”, I think we were implicitly restricting to the special case where {permissible options w.r.t. the other criterion} ⊆ {permissible options w.r.t. consequentialism}. If that doesn’t hold, it’s not obvious to me what to do.
Regardless, it’s not clear to me what alternative you’d propose in this situation that’s less weird than choosing “saying ‘yeah it’s good’”. (In particular I’m not sure if you’re generally objecting to incomplete preferences per se, or to some way of choosing an option given incomplete preferences (w.r.t. consequentialism).)

Anthony DiGiovanni 15 Jul 2025 19:46 UTC
5 points
3
in reply to: JustisMills’s comment on: Daniel Kokotajlo’s Shortform
Ah sorry, I realized that “in expectation” was implied. It seems the same worry applies. “Effects of this sort are very hard to reliably forecast” doesn’t imply “we should set those effects to zero in expectation”. Cf. Greaves’s discussion of complex cluelessness.
Tbc, I don’t think Daniel should beat himself up over this either, if that’s what you mean by “grade yourself”. I’m just saying that insofar as we’re trying to assess the expected effects of an action, the assumption that these kinds of indirect effects cancel out in expectation seems very strong (even if it’s common).

Anthony DiGiovanni 13 Jul 2025 23:11 UTC
3 points
0
in reply to: JustisMills’s comment on: Daniel Kokotajlo’s Shortform
attempts to control such effects with 3d chess backfire as often as not
Taken literally, this sounds like a strong knife-edge condition to me. Why do you think this? Even if what you really mean is “close enough to ⁵⁰⁄₅₀ that the first-order effect dominates,” that also sounds like a strong claim given how many non-first-order effects we should expect there to be (ETA: and given how out-of-distribution the problem of preventing AI risk seems to be).

Anthony DiGiovanni 23 Jun 2025 11:08 UTC
5 points
0
in reply to: ryan_greenblatt’s comment on: AI 2027: What Superintelligence Looks Like
(Replying now bc of the “missed the point” reaction:) To be clear, my concern is that someone without more context might pattern-match the claim “Anthony thinks we shouldn’t have probabilistic beliefs” to “Anthony thinks we have full Knightian uncertainty about everything / doesn’t think we can say any A is more or less likely than any B”. From my experience having discussions about imprecision, conceptual rounding errors are super common, so I think this is a reasonable concern even if you personally find it obvious that “probabilistic” should be read as “using a precise probability distribution”.

Anthony DiGiovanni 21 Jun 2025 9:21 UTC
6 points
0
in reply to: Mitchell_Porter’s comment on: Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions
Sorry to be clear, I don’t claim LW has overlooked these topics (except unawareness and alternatives to classical Bayesian epistemology, which I do think have been quite severely neglected). The reason I wrote this post was that the following claims seem non-obvious:
- Thinking further about wisdom concepts these days is not just a distraction from “notkilleveryoneism”.
- The concepts in the checklist do in fact seem to satisfy conditions (1)+(2) (the definition of “wisdom concepts”). (My impression is that it’s somewhat common for people to think many of the concepts I list admit “objective” answers (i.e. just believe and do what “works” / has the best empirical track record), which all sufficiently intelligent agents will converge to. ETA: Relatedly, it might not be salient to some readers that the answer to “is this decision a catastrophic mistake?” could be sensitive to all these topics.)
- The sub-questions I list are open questions. (E.g., I expect it to be controversial that agents aren’t necessarily rationally required to avoid diachronic sure losses.)

Anthony DiGiovanni 13 Jun 2025 9:39 UTC
1 point
0
in reply to: Antoine de Scorraille’s comment on: 1. The challenge of unawareness for impartial altruist action guidance: Introduction
There are indeed connections between these ideas, but I think it’s very important not to round unawareness off to either of those two. Unawareness is its own epistemic problem with its own implications. (E.g., it’s not the same as non-realizability because there are many hypotheses that are not self-referential of which we’re unaware/coarsely aware.)

Anthony DiGiovanni 4 Jun 2025 20:41 UTC
1 point
0
in reply to: elifland’s comment on: AI 2027: What Superintelligence Looks Like
As a followup: Hopefully this post of mine further clarifies my position, specifically the “Unawareness and superforecasting” section.

Anthony DiGiovanni 3 Jun 2025 8:23 UTC
3 points
2
in reply to: David Gross’s comment on: 1. The challenge of unawareness for impartial altruist action guidance: Introduction
Thanks!

People use “cluelessness” to mean various importantly different things, which is why I de-emphasized that term in this sequence. I think unawareness is a (major) source of what Greaves called complex cluelessness, which is a situation where:
(CC1) We have some reasons to think that the unforeseeable consequences of A1 would
systematically tend to be substantially better than those of A2;
(CC2) We have some reasons to think that the unforeseeable consequences of A2 would
systematically tend to be substantially better than those of A1;
(CC3) It is unclear how to weigh up these reasons against one another.
(It’s a bit unclear how “unforeseeable” is defined. In context / in the usual ways people tend to talk about complex cluelessness, I think it’s meant to encompass cases where the problem isn’t unawareness but rather other obstacles to setting precise credences.)
But unawareness itself means “many possible consequences of our actions haven’t even occurred to us in much detail, if at all” (as unpacked in the introduction section). ETA: I think it’s important to conceptually separate this from complex cluelessness, because you might think unawareness is a challenge that demands a response beyond straightforward Bayesianism, even if you disagree that it implies complex cluelessness.
What links here?

Anthony DiGiovanni 3 Jun 2025 7:26 UTC
4 points
3
in reply to: Alexei’s comment on: 1. The challenge of unawareness for impartial altruist action guidance: Introduction
(Note: A mod moved the subsequent posts to drafts for this reason. I’ll repost them spaced out.)

Anthony DiGiovanni 17 May 2025 15:44 UTC
1 point
0
on: antimonyanthony’s Shortform
“Messy” tasks vs. hard-to-verify tasks
(Followup to here.)
I’ve found LLMs pretty useful for giving feedback on writing, including writing a fairly complex philosophical piece. Recently I wondered, “Hm, is this evidence that LLMs’ capabilities can generalize well to hard-to-verify tasks? That would be an update for me (toward super-short timelines, for one).”
I haven’t thought deeply about this yet, but I think the answer is: no. We should disentangle the intuitive “messiness” of the task of giving writing advice, from how difficult success is to verify:
- Yes, the function in my brain that maps “writing feedback I’m given” to “how impressed I am with the feedback” seems pretty messy. My evaluation function for writing feedback is more fuzzy than, say, a function that checks if some piece of code works. It’s genuinely cool that LLMs can learn the former, especially without fine-tuning on me in particular.
- But it’s still easy to verify whether someone has a positive vs. negative reaction^[1] to some writing feedback. The model doesn’t need to actually be good at (1) making the thing I’m writing more effective at its intended purposes — i.e., helping readers have more informed and sensible views on the topic — in order to be good at (2) making me think “yeah that suggestion seems reasonable.”
  - Presumably there’s some correlation between the two. Humans rely on this correlation when giving writing advice all the time. But the correlation could still be rather weak (and its sign rather context-dependent), and LLMs’ advice would look just as impressive to me.
Maybe this is obvious? My sense, though, is that these two things could easily get conflated.
1. ^
  “Reaction” is meant to capture all the stuff that makes someone consider some feedback “useful” immediately when (or not too long after) they read it. I’m not saying LLMs are trying to make me feel good about my writing in this context, though that’s probably true to some extent.

Anthony DiGiovanni 25 Apr 2025 4:07 UTC
2 points
0
in reply to: elifland’s comment on: AI 2027: What Superintelligence Looks Like
what would your preferred state of the timelines discourse be?
My main recommendation would be, “Don’t pin down probability distributions that are (significantly) more precise than seems justified.” I can’t give an exact set of guidelines for what constitutes “more precise than seems justified” (such is life as a bounded agent!). But to a first approximation:
- Suppose I’m doing some modeling, and I find myself thinking, “Hm, what feels like the right median for this? 40? But ehh maybe 50, idk…”
- And suppose I can’t point to any particular reason for favoring 40 over 50, or vice versa. (Or, I can point to some reasons for one number but also some reasons for the other, and it’s not clear which are stronger — when I try weighing up these reasons against each other, I find some reasons for one higher-order weighing and some reasons for another, etc. etc.)
  - This isn’t a problem for every pair of numbers that occurs to us when estimating stuff. If I have to pick between, say, 2030 or 2060 for my AGI timelines median, it seems like I have reason to trust my (imprecise!) intuition^[1] that AI progress is going fast enough that 2060 is unreasonable.
- Then: I wouldn’t pick just one of 40 or 50 for the median, or just one number in between. I’d include them all.
I totally agree that we can’t pin down the parameters to high precision
I’m not sure I understand your position, then. Do you endorse imprecise probabilities in principle, but report precise distributions for some illustrative purpose? (If so, I’d worry that’s misleading.) My guess is that we’re not yet on the same page about what “pin down the parameters to high precision” means.
I think this sort of work is valuable because it introduces new, comprehensive-ish frameworks for thinking about timelines/takeoff
Agreed! I appreciate your detailed transparency in communicating the structure of the model, even if I disagree about the formal epistemology.
communicating the reasoning behind our beliefs in a more transparent way than a non-quantitative approach would
If our beliefs about this domain ought to be significantly imprecise, not just uncertain, then I’d think the more transparent way to communicate your reasoning would be to report an imprecise (yet still quantitative) forecast.
1. ^
  I don’t want to overstate this, tbc. I think this intuition is only trustworthy to the extent that I think it’s a compression of (i) lots of cached understanding I’ve gathered from engaging with timelines research, and (ii) conservative-seeming projections of AI progress that pass enough of a sniff test. If I came into this domain with no prior background, just having a vibe of “2060 is way too far off” wouldn’t be a sufficient justification, I think.

Anthony DiGiovanni 22 Apr 2025 3:49 UTC
5 points
1
on: AI 2027 is a Bet Against Amdahl’s Law
I think once an AI is extremely good at AI R&D, lots of these skills will transfer to other domains, so it won’t have to be that much more capable to generalize to all domains, especially if trained in environments designed for teaching general skills.
This step, especially, really struck me as under-argued relative to how important it seems to be for the conclusion. This isn’t to pick on the authors of AI 2027 in particular. I’m generally confused as to why arguments for an (imminent) intelligence explosion don’t say more on this point, as far as I’ve read. (I’m reminded of this comic.) But I might well have missed something!

Anthony DiGiovanni 21 Apr 2025 3:25 UTC
2 points
0
in reply to: ryan_greenblatt’s comment on: AI 2027: What Superintelligence Looks Like
arguing against having probabilistic beliefs about events which are unprecedented
Sorry, I’m definitely not saying this. First, in the linked post (see here), I argue that our beliefs should still be probabilistic, just imprecisely so. Second, I’m not drawing a sharp line between “precedented” and “unprecedented.” My point is: Intuitions are only as reliable as the mechanisms that generate them. And given the sparsity of feedback loops^[1] and unusual complexity here, I don’t see why the mechanisms generating AGI/ASI forecasting intuitions would be truth-tracking to a high degree of precision. (Cf. Violet Hour’s discussion in Sec. 3 here.)
the level of precedentedness is continous
Right, and that’s consistent with my view. I’m saying, roughly, the degree of imprecision (/width of the interval-valued credence) should increase continuously with the depth of unprecedentedness, among other things.

forecasters have successfully done OK at predicting increasingly unprecedented events
As I note here, our direct evidence only tells us (at best) that people can successfully forecast up to some degree of precision, in some domains. How we ought to extrapolate from this to the case of AGI/ASI forecasting is very underdetermined.
1. ^
  On the actual information of interest (i.e. information about AGI/ASI), that is, not just proxies like forecasting progress in weaker or narrower AI.