I think that your paraphrasing
I don’t think MIRI’s efforts are valuable because I think that AI in general has made no progress on AGI for the last 60 years, but aside from that MIRI isn’t doing anything wrong in particular, and it would be an admittedly different story if I thought that AI in general was making progress on AGI.
is pretty close to my position.
I would qualify it by saying:
I’d replace “no progress” with “not enough progress for there to be a known research program with a reasonable chance of success.”
I have high confidence that some of the recent advances in narrow AI will contribute (whether directly or indirectly) to the eventual creation of AGI (contingent on this event occurring), just not necessarily in a foreseeable way.
If I discover that there’s been significantly more progress on AGI than I had thought, then I’ll have to reevaluate my position entirely. I could imagine updating in the directly of MIRI’s FAI work being very high value, or I could imagine continuing to believe that MIRI’s FAI research isn’t a priority, for reasons different from my current ones.
A few nitpicks on choice of “Brier-boosting” as a description of CFAR’s approach:
Predictive power is maximized when Brier score is minimized
Brier score is the sum of differences between probabilities assigned to events and indicator variables that are are 1 or 0 according to whether the event did or did not occur. Good calibration therefore corresponds to minimizing Brier score rather than maximizing it, and “Brier-boosting” suggests maximization.
What’s referred to as “quadratic score” is essentially the same as the negative of Brier score, and so maximizing quadratic score corresponds to maximizing predictive power.
Brier score fails to capture our intuitions about assignment of small probabilities
A more substantive point is that even though the Brier score is minimized by being well-calibrated, the way in which it varies with the probability assigned to an event does not correspond to our intuitions about how good a probabilistic prediction is. For example, suppose four observers A, B, C and D assigned probabilities 0.5, 0.4, 0.01 and 0.000001 (respectively) to an event E occurring and the event turns out to occur. Intuitively, B’s prediction is only slightly worse than A’s prediction, whereas D’s prediction is much worse than C’s prediction. But the difference between the increase in B’s Brier score and A’s Brier score is 0.36 − 0.25 = 0.11, which is much larger than corresponding difference for D and C, which is approximately 0.02.
Brier score is not constant across mathematically equivalent formulations of the same prediction
Suppose that a basketball player is to make three free throws, observer A predicts that the player makes each one with probability p and suppose that observer B accepts observer A’s estimate and notes that this implies that the probability that the player makes all three free throws is p^3, and so makes that prediction.
Then if the player makes all three free throws, observer A’s Brier score increases by
3*(1 - p)^2
while observer B’s Brier score increases by
(1 - p^3)^2
But these two expressions are not equal in general, e.g. for p = 0.9 the first is 0.03 and the second is 0.073441. So changes to Brier score depend on the formulation of a prediction as opposed to the prediction itself.
======
The logarithmic scoring rule handles small probabilities well, and is invariant under changing the representation of a prediction, and so is preferred. I first learned of this from Eliezer’s essay A Technical Explanation of a Technical Explanation.
Minimizing logarithmic score is equivalent to maximizing the likelihood function for logistic regression / binary classification. Unfortunately, the phrase “likelihood boosting” has one more syllable than “Brier boosting” and doesn’t have same alliterative ring to it, so I don’t have an actionable alternative suggestion :P.