the running 11 year average of global temperature has not flattened since 1990, but continued upward at almost the same pace with only a moderate decrease in slope since the outlier 1998 year. The 11 years 2000-2010 global mean temperature is significantly higher than the 10 years 1990-2000.
That is not “flat since the 90s”. The only way to get “flat since the 90s” is to compare 1998 to various more recent years noting that it was nearly as hot as 2005 and 2010 etc. and slightly hotter than other years in the 2000s, as if 1 year matters as much as 10 in a noisy data set.
If he had said “flat since 1998” that might be technically true in a way, but it’s a little like saying the stock market has been flat since 2007.
That doesn’t even consider using climate knowledge to adjust for some of the variance, for instance that El Niño years are hotter, and that 1998 was the biggest El Niño year on record.
I wouldn’t necessarily read too much into your calibration question, given that it’s just one question, and there was something of a gotcha.
One thing I learned from doing calibration exercises is that I tended to be much too tentative with my 50% guesses.
When I answered the calibration question, I used my knowledge of other math that either had to, or couldn’t have come before him, to narrow the possible window of his birth down to about 200 years. Random chance would then give me about a 20% shot. I thought I had somewhat better information than random chance within that window so I estimated my guess (IIRC) at 30%. I was, alas wrong, but I’m pretty confident that I would get around 30% of problems with a similar profile correct. If this problem was tricky, then it is more likely than average to be a problem that people get wrong in a large set. But this will be balanced by problems which are straightforward.
Not to suggest that this result isn’t evidence of LW’s miscalibration. In fact, it’s strong enough evidence for me to throw into serious doubt the last survey’s finding that we were better calibrated than a normal population. OTOH neither bit of evidence is terribly strong. A set of 5-10 different problems would make for much stronger evidence one way or the other.