Researcher at the Center on Long-Term Risk. All opinions my own.
Anthony DiGiovanni
what would your preferred state of the timelines discourse be?
My main recommendation would be, “Don’t pin down probability distributions that are (significantly) more precise than seems justified.” I can’t give an exact set of guidelines for what constitutes “more precise than seems justified” (such is life as a bounded agent!). But to a first approximation:
Suppose I’m doing some modeling, and I find myself thinking, “Hm, what feels like the right median for this? 40? But ehh maybe 50, idk…”
And suppose I can’t point to any particular reason for favoring 40 over 50, or vice versa. (Or, I can point to some reasons for one number but also some reasons for the other, and it’s not clear which are stronger — when I try weighing up these reasons against each other, I find some reasons for one higher-order weighing and some reasons for another, etc. etc.)
This isn’t a problem for every pair of numbers that occurs to us when estimating stuff. If I have to pick between, say, 2030 or 2060 for my AGI timelines median, it seems like I have reason to trust my (imprecise!) intuition[1] that AI progress is going fast enough that 2060 is unreasonable.
Then: I wouldn’t pick just one of 40 or 50 for the median, or just one number in between. I’d include them all.
I totally agree that we can’t pin down the parameters to high precision
I’m not sure I understand your position, then. Do you endorse imprecise probabilities in principle, but report precise distributions for some illustrative purpose? (If so, I’d worry that’s misleading.) My guess is that we’re not yet on the same page about what “pin down the parameters to high precision” means.
I think this sort of work is valuable because it introduces new, comprehensive-ish frameworks for thinking about timelines/takeoff
Agreed! I appreciate your detailed transparency in communicating the structure of the model, even if I disagree about the formal epistemology.
communicating the reasoning behind our beliefs in a more transparent way than a non-quantitative approach would
If our beliefs about this domain ought to be significantly imprecise, not just uncertain, then I’d think the more transparent way to communicate your reasoning would be to report an imprecise (yet still quantitative) forecast.
- ^
I don’t want to overstate this, tbc. I think this intuition is only trustworthy to the extent that I think it’s a compression of (i) lots of cached understanding I’ve gathered from engaging with timelines research, and (ii) conservative-seeming projections of AI progress that pass enough of a sniff test. If I came into this domain with no prior background, just having a vibe of “2060 is way too far off” wouldn’t be a sufficient justification, I think.
I think once an AI is extremely good at AI R&D, lots of these skills will transfer to other domains, so it won’t have to be that much more capable to generalize to all domains, especially if trained in environments designed for teaching general skills.
This step, especially, really struck me as under-argued relative to how important it seems to be for the conclusion. This isn’t to pick on the authors of AI 2027 in particular. I’m generally confused as to why arguments for an (imminent) intelligence explosion don’t say more on this point, as far as I’ve read. (I’m reminded of this comic.) But I might well have missed something!
arguing against having probabilistic beliefs about events which are unprecedented
Sorry, I’m definitely not saying this. First, in the linked post (see here), I argue that our beliefs should still be probabilistic, just imprecisely so. Second, I’m not drawing a sharp line between “precedented” and “unprecedented.” My point is: Intuitions are only as reliable as the mechanisms that generate them. And given the sparsity of feedback loops[1] and unusual complexity here, I don’t see why the mechanisms generating AGI/ASI forecasting intuitions would be truth-tracking to a high degree of precision. (Cf. Violet Hour’s discussion in Sec. 3 here.)
the level of precedentedness is continous
Right, and that’s consistent with my view. I’m saying, roughly, the degree of imprecision (/width of the interval-valued credence) should increase continuously with the depth of unprecedentedness, among other things.
forecasters have successfully done OK at predicting increasingly unprecedented events
As I note here, our direct evidence only tells us (at best) that people can successfully forecast up to some degree of precision, in some domains. How we ought to extrapolate from this to the case of AGI/ASI forecasting is very underdetermined.
- ^
On the actual information of interest (i.e. information about AGI/ASI), that is, not just proxies like forecasting progress in weaker or narrower AI.
- ^
Despite (2), it’s important to try anyways
FWIW this is the step I disagree with, if I understand what you mean by “try”. See this post.
Forecasts which include intuitive estimations are commonplace and often useful (see e.g. intelligence analysis, Superforecasting, prediction markets, etc.).
In this context, we’re trying to forecast radically unprecedented events, occurring on long subjective time horizons, where we have little reason to expect these intuitive estimates to be honed by empirical feedback. Peer disagreement is also unusually persistent in this domain. So it’s not at all obvious to me that, based on superforecasting track records, we can trust that our intuitions pin down these parameters to a sufficient degree of precision.[1] More on this here (this is not a comprehensive argument for my view, tbc; hoping to post something spelling this out more soon-ish!).
- ^
As the linked post explains, “high precision” here does not mean “the credible interval for the parameter is narrow”. It means that your central/point estimate of the parameter is pinned down to a narrow range, even if you have lots of uncertainty.
- ^
Assumptions that highly constrain the model should be first and foremost rather than absent from publicly facing write-ups and only in appendices.
Strongly agree — cf. nostalgebraist’s posts making this point on the bio anchors and AI 2027 models. I have the sense this is a pretty fundamental epistemic crux between camps of people making predictions (or suspending judgment!) about AI takeoff.
It sounds like you’re viewing the goal of thinking about DT as: “Figure out your object-level intuitions about what to do in specific abstract problem structures. Then, when you encounter concrete problems, you can ask which abstract problem structure the concrete problems correspond to and then act accordingly.”
I think that approach has its place. But there’s at least another very important (IMO more important) goal of DT: “Figure out your meta-level intuitions about why you should do one thing vs. another, across different abstract problem structures.” (Basically figuring out our “non-pragmatic principles” as discussed here.) I don’t see how just asking Claude helps with that, if we don’t have evidence that Claude’s meta-level intuitions match ours. Our object-level verdicts would just get reinforced without probing their justification. Garbage in, garbage out.
often, we don’t think of commitment as literally closing off choice—e.g., it’s still a “choice” to keep a promise
For what it’s worth, if a “commitment” is of this form, I struggle to see what the motivation for paying in Parfit’s hitchhiker would be. The “you want to be the sort of person who pays” argument doesn’t do anything for me, because that’s answering a different question than “should you choose to pay [insofar as you have a ‘choice’]?” I worry there’s a motte-and-bailey between different notions of “commitment” going on. I’d be curious for your reactions to my thoughts on this here.
Isn’t the “you get what you measure” problem a problem for capabilities progress too, not just alignment? I.e.: Some tasks are sufficiently complex (hence hard to evaluate) and lacking in unambiguous ground-truth feedback that, when you turn the ML crank on them, you’re not necessarily going to select for actually doing the task well. You’ll select for “appearing to do the task well,” and it’s open question how well this correlates with actually doing the task well. (“Doing the task” here can include something much higher-level, like “being ‘generally intelligent’.”)
Which isn’t to say this problem wouldn’t bite especially hard for alignment. Alignment seems harder to verify than lots of things. But this is one reason I’m not fully sold that once you get human-level AI, capabilities progress will get faster.(I’m hardly an expert on this, so might well have missed existing discourse on & answers to this question.)
No, at some point you “jump all the way” to AGI
I’m confused as to what the actual argument for this is. It seems like you’ve just kinda asserted it. (I realize in some contexts all you can do is offer an “incredulous stare,” but this doesn’t seem like the kind of context where that suffices.)
I’m not sure if the argument is supposed to be the stuff you say in the next paragraph (if so, the “Also” is confusing).
I worry there’s kind of a definitional drift going on here. I guess Holden doesn’t give a super clean definition in the post, but AFAICT these quotes get at the heart of the distinction:
Sequence thinking involves making a decision based on a single model of the world …
Cluster thinking – generally the more common kind of thinking – involves approaching a decision from multiple perspectives (which might also be called “mental models”), observing which decision would be implied by each perspective, and weighing the perspectives in order to arrive at a final decision. … [T]he different perspectives are combined by weighing their conclusions against each other, rather than by constructing a single unified model that tries to account for all available information.“Making a decision based on a single model of the world” vs. “combining different perspectives by weighing their conclusions against each other” seems orthogonal to the failure mode you mention. (Which is a failure to account for a mechanism that the “cluster thinker” here explicitly foresees.) I’m not sure if you’re claiming that empirically, people who follow sequence thinking have a track record of this failure mode? If so, I guess I’m just suspicious of that claim and would expect it’s grounded mostly in vibes.
here’s a story where we totally fail on that first thing and the second thing turns out to matter a ton!
I’m confused as to why this is inconsistent with sequence thinking. This sounds like identifying a mechanistic story for why the policy/technical win would have good consequences, and accounting for that mechanism in your model of the overall value of working on the policy/technical win. Which a sequence thinker can do just fine.
working more directly with metrics such as “what are the most expected-value rewarding actions that a bounded agent can make given the evidence so far”
I’m not sure I exactly understand your argument, but it seems like this doesn’t avoid the problem of priors, because what’s the distribution w.r.t. which you define “expected-value rewarding”?
(General caveat that I’m not sure if I’m missing your point.)
Sure, there’s still a “problem” in the sense that we don’t have a clean epistemic theory of everything. The weights we put on the importance of different principles, and how well different credences fulfill them, will be fuzzy. But we’ve had this problem all along.
There are options other than (1) purely determinate credences or (2) implausibly wide indeterminate credences. To me, there are very compelling intuitions behind the view that the balance among my epistemic principles is best struck by (3) indeterminate credences that are narrow in proportion to the weight of evidence and how far principles like Occam seem to go. This isn’t objective (neither are any other principles of rationality less trivial than avoiding synchronic sure losses). Maybe your intuitions differ, upon careful reflection. That doesn’t mean it’s a free-for-all. Even if it is, this isn’t a positive argument for determinacy.
both do rely on my intuitions
My intuitions about foundational epistemic principles are just about what I philosophically endorse — in that domain, I don’t know what else we could possibly go on other than intuition. Whereas, my intuitions about empirical claims about the far future only seem worth endorsing as far as I have reasons to think they’re tracking empirical reality.
it seems pretty arbitrary to me where you draw the boundary between a credence that you include in your representor vs. not. (Like: What degree of justification is enough? We’ll always have the problem of induction to provide some degree of arbitrariness.)
To spell out how I’m thinking of credence-setting: Given some information, we apply different (vague) non-pragmatic principles we endorse — fit with evidence, Occam’s razor, deference, etc.
Epistemic arbitrariness means making choices in your credence-setting that add something beyond these principles. (Contrast this with mere “formalization arbitrariness”, the sort discussed in the part of the post about vagueness.)
I don’t think the problem of induction forces us to be epistemically arbitrary. Occam’s razor (perhaps an imprecise version!) favors priors that penalize a hypothesis like “the mechanisms that made the sun rise every day in the past suddenly change tomorrow”. This seems to give us grounds for having prior credences narrower than (0, 1), even if there’s some unavoidable formalization arbitrariness. (We can endorse the principle underlying Occam’s razor, “give more weight to hypotheses that posit fewer entities”, without a circular justification like “Occam’s razor worked well in the past”. Admittedly, I don’t feel super satisfied with / unconfused about Occam’s razor, but it’s not just an ad hoc thing.)
By contrast, pinning down a single determinate credence (in the cases discussed in this post) seems to require favoring epistemic weights for no reason. Or at best, a very weak reason that IMO is clearly outweighed by a principle of suspending judgment. So this seems more arbitrary to me than indeterminate credences, since it injects epistemic arbitrariness on top of formalization arbitrariness.
(I’ll reply to the point about arbitrariness in another comment.)
I think it’s generally helpful for conceptual clarity to analyze epistemics separately from ethics and decision theory. E.g., it’s not just EV maximization w.r.t. non-robust credences that I take issue with, it’s any decision rule built on top of non-robust credences. And I worry that without more careful justification, “[consequentialist] EV-maximizing within a more narrow “domain”, ignoring the effects outside of that “domain”″ is pretty unmotivated / just kinda looking under the streetlight. And how do you pick the domain?
(Depends on the details, though. If it turns out that EV-maximizing w.r.t. impartial consequentialism is always sensitive to non-robust credences (in your framing), I’m sympathetic to “EV-maximizing w.r.t. those you personally care about, subject to various deontological side constraints etc.” as a response. Because “those you personally care about” isn’t an arbitrary domain, it’s, well, those you personally care about. The moral motivation for focusing on that domain is qualitatively different from the motivation for impartial consequentialism.)
So I’m hesitant to endorse your formulation. But maybe for most practical purposes this isn’t a big deal, I’m not sure yet.
That’s right.
(Not sure you’re claiming otherwise, but FWIW, I think this is fine — it’s true that there’s some computational cost to this step, but in this context we’re talking about the normative standard rather than what’s most pragmatic for bounded agents. And once we start talking about pragmatic challenges for bounded agents, I’d be pretty dubious that, e.g., “pick a very coarse-grained ‘best guess’ prior and very coarse-grained way of approximating Bayesian updating, and try to optimize given that” would be best according to the kinds of normative standards that favor indeterminate beliefs.)
does that require you to either have the ability to commit to a plan or the inclination to consistently pick your plan from some prior epistemic perspective
You aren’t required to take an action (/start acting on a plan) that is worse from your current perspective than some alternative. Let maximality-dominated mean “w.r.t. each distribution in my representor, worse in expectation than some alternative.” (As opposed to “dominated” in the sense of “worse than an alternative with certainty”.) Then, in general you would need[1] to ask, “Among the actions/plans that are not maximality-dominated from my current perspective, which of these are dominated from my prior perspective?” And rule those out.
- ^
If you care about diachronic norms of rationality, that is.
- ^
mostly problems with logical omnisicence not being satisfied
I’m not sure, given the “Indeterminate priors” section. But assuming that’s true, what implication are you drawing from that? (The indeterminacy for us doesn’t go away just because we think logically omniscient agents wouldn’t have this indeterminacy.)
the arbitrariness of the prior is just a fact of life
The arbitrariness of a precise prior is a fact of life. This doesn’t imply we shouldn’t reduce this arbitrariness by having indeterminate priors.
The obvious answer is only when there is enough indeterminacy to matter; I’m not sure if anyone would disagree. Because the question isn’t whether there is indeterminacy, it’s how much, and whether it’s worth the costs of using a more complex model instead of doing it the Bayesian way.
Based on this I think you probably mean something different by “indeterminacy” than I do (and I’m not sure what you mean). Many people in this community explicitly disagree with the claim that our beliefs should be indeterminate at all, as exemplified by the objections I respond to in the post.
When you say “whether it’s worth the costs of using a more complex model instead of doing it the Bayesian way”, I don’t know what “costs” you mean, or what non-question-begging standard you’re using to judge whether “doing it the Bayesian way” would be better. As I write in the “Background” section: “And it’s question-begging to claim that certain beliefs “outperform” others, if we define performance as leading to behavior that maximizes expected utility under those beliefs. For example, it’s often claimed that we make “better decisions” with determinate beliefs. But on any way of making this claim precise (in context) that I’m aware of, “better decisions” presupposes determinate beliefs!”
You also didn’t quite endorse suspending judgement in that case—“If someone forced you to give a best guess one way or the other, you suppose you’d say “decrease”.
The quoted sentence is consistent with endorsing suspending judgment, epistemically speaking. As the key takeaways list says, “If you’d prefer to go with a given estimate as your “best guess” when forced to give a determinate answer, that doesn’t imply this estimate should be your actual belief.”
But if it is decision relevant, and there is only a binary choice available, your best guess matters
I address this in the “Practical hallmarks” section — what part of my argument there do you disagree with?
“Messy” tasks vs. hard-to-verify tasks
(Followup to here.)
I’ve found LLMs pretty useful for giving feedback on writing, including writing a fairly complex philosophical piece. Recently I wondered, “Hm, is this evidence that LLMs’ capabilities can generalize well to hard-to-verify tasks? That would be an update for me (toward super-short timelines, for one).”
I haven’t thought deeply about this yet, but I think the answer is: no. We should disentangle the intuitive “messiness” of the task of giving writing advice, from how difficult success is to verify:
Yes, the function in my brain that maps “writing feedback I’m given” to “how impressed I am with the feedback” seems pretty messy. My evaluation function for writing feedback is more fuzzy than, say, a function that checks if some piece of code works. It’s genuinely cool that LLMs can learn the former, especially without fine-tuning on me in particular.
But it’s still easy to verify whether someone has a positive vs. negative reaction[1] to some writing feedback. The model doesn’t need to actually be good at (1) making the thing I’m writing more effective at its intended purposes — i.e., helping readers have more informed and sensible views on the topic — in order to be good at (2) making me think “yeah that suggestion seems reasonable.”
Presumably there’s some correlation between the two. Humans rely on this correlation when giving writing advice all the time. But the correlation could still be rather weak (and its sign rather context-dependent), and LLMs’ advice would look just as impressive to me.
Maybe this is obvious? My sense, though, is that these two things could easily get conflated.
“Reaction” is meant to capture all the stuff that makes someone consider some feedback “useful” immediately when (or not too long after) they read it. I’m not saying LLMs are trying to make me feel good about my writing in this context, though that’s probably true to some extent.