Thanks Daniel! Quick replies:
On down-weighting low-end vs high-end compute levels: The reason that the down-weighting for low-end compute levels was done in a separate and explicit way was just because I think there’s a structural difference between the two updates. When updating against low-end compute levels, I think it makes more sense to do that update within each hypothesis, because only some orders of magnitude are affected. To implement an “update against high-end compute levels”, we can simply lower the probability we assign to high-compute hypotheses, since there is no specific reason to shave off just a few OOMs at the far right. My probability on the Evolution Anchor hypothesis is 10%, and my probability on the Long Horizon Neural Network hypothesis is 15%; this is lower than my probability on the Short Horizon Neural Network hypothesis (20%) and Medium Horizon Neural Network hypothesis (30%) because I feel that the higher-end hypotheses are less consistent with the holistic balance of evidence.
On the GPT scaling trend: I think that the way to express the view that GPT++ would constitute TAI is to heavily weight the Short Horizon Neural Network hypothesis, potentially along with shifting and/or narrowing the range of effective horizon lengths in that bucket to be more concentrated on the low end (e.g. 0.1 to 30 subjective seconds rather than 1 to 1000 subjective seconds).
On getting transformative abilities with 1e15 parameter models trained for 30 subjective years: I think this is pretty unlikely, but not crazy like you said; I think the way to express this view would be to up-weight the Lifetime Anchor hypothesis. My weight on it is currently 5%. Additionally, all the Neural Network hypotheses bake in substantial probability to relatively small models (e.g. 1e12 FLOP/subj sec) and scaling more shallow than we’ve seen demonstrated so far (e.g. an exponent of 0.25).
Thanks, I just cut the link!