I agree that posting the results was the correct thing to do, and appreciate that John is trying to figure out if this is useful—but I actually claim the post is an example of how rationality is hard, and even pursuing it can be misleading if you aren’t very, very careful.
In The Twelve Virtues of Rationality, this post gets virtue points for the first (curiosity, for looking into whether it works,) third (lightness, being willing to update marginally on evidence,) fourth (evenness, updating even when the evidence isn’t in the direction desired,) sixth (empiricism, actually testing something,) and tenth (precision, specifying what was expected.) But virtue is certainly not a guarantee of success, even for completely virtuous approaches.
I think this tries to interpret data correctly, but falls short on the eleventh virtue, scholarship. For those who want to do Actual Science™, the first step is to know about the domain, and make sure your experiment is valid and useful. Going out and interacting with reality is valuable once your models are good enough to be able to interpret evidence. But Science is hard. (Perhaps not as hard as rationality, at least in some ways, but still very, very difficult.) In this case, without checking what the specific target being tested for was, as Christian notes, the data doesn’t actually provide useful evidence. And if he had a (recent) asymptomatic case of COVID, the result would have been positive, which is evidence that the vaccine doesn’t work, but would have been interpreted as evidence that it did.
Davidmanheim
You need to see if the spike peptide included corresponds to the antibody being tested for—and given how many targets there are, I would be surprised if it did.
Despite holding a far lower prior on efficacy, I’m agreeing with Christian—this evidence shouldn’t be a reason to update anywhere nearly as strongly as you did against effectiveness.
Mostly vague “accidents and harmful unknown unknowns aren’t that unlikely here”—because we have data on baseline success at “not have harmful side effects,” and it is low. We also know that lots of important side effects are unusual, so the expected loss can be high even after a number of “successes,” and this is doubly true because no-one is actually tracking side effects. We don’t know much about efficacy either, but again, on base rates it is somewhat low. (Base rates for mRNA are less clear, and may be far higher—but these sequences are unfiltered, so I’m not sure even those bse rates would apply.)
Finally, getting the adjuvants to work is typically tricky for vaccines, and I’d be very concerned about making them useless, or inducing reactions to something other than the virus. But if you want to know about intentional misuse, it’s relatively low. I would wonder about peanut protein to induce you to develop a new allergy because you primed your immune system to react to a new substance, but you’d need someone more expert than I.
Overall, I’d be really happy taking bets that in 20 years, looking back with (hopefully) much greater understanding of mRNA vaccines, a majority of immunologists would respond to hearing details about this idea with a solid “that’s idiotic, what the hell were those idiots thinking?” (If anyone wants to arrange details of this bet, let me know—it sounds like a great way to diversify and boost my expected retirement returns.)
Sorry, this is clearly much more confrontational than I intended.
First, I apologize. I really didn’t intend for the tone to be attacking, and I am sorry that was how it sounded. I certainly wasn’t intentionally “suggesting [you were] somehow trying to hide or deny” any of the issues. I thought it was worth noting that the initial characterization was plausibly misleading, given that the sole indicator of being a “nice middle class area” seemed to be percentage of people with PhDs. Your defense was that it was no more than 3x the number of PhDs, but that doesn’t mean top 1⁄3, a point which you later agreed to. And after further discussion, I made and you checked an object level prediction I made, so I ceded the point.
Despite ceding the main earlier point, I continued the discussion, since I think the terms and definitions have gotten very confused by citing various incompatible sources and citing isolated sections of articles. And the same way that I have picked specific things to focus on responding to, you have picked many things I have said which you ignore. That’s fine—but my responses were not an isolated demand for rigor; I have made concrete claims and acknowledged those which were refuted, and you have raised points which I have responded to.
So again, I am not disputing your neighborhood, which I conceded I initially thought was more affluent than it is. Despite that, there is plenty you have now said characterizing classes, in responses, which I think should be clarified if you want to continue. Again, this doesn’t reflect on your earlier claim about you child’s school, but your defense of the position has been confusing to me, at least. For example, you compare $166,000/year in the US, which is the top 15% there, to incomes in your neighborhood—then note that £80k (i.e. $110k) in the UK is the top 5%. You don’t say anything about the equivalent income in the UK. I again agree that your neighborhood is not in the top 15%, but the top 15% there in the UK is £46k. (not $166k, i.e. £118k) The actual average income in your area, £36k, is in the top 25%. (I would suspect the incomes for those with children in the area is higher, but again, not near the top of upper middle class) Finally, the specific examples of upper middle class that you cite—Cameron, etc. are discussing their family backgrounds, not their current status.
Wait, the claim was never that everyone is well off—of course we expect there to be a distribution. But if a sizeable portion of the children at the school largely have very high-socioeconomic-status parents, even if it’s only 10% of the parents, that should be compared to a median of plausibly less than 1% of parents in the set of schools overall, it would be incorrect to infer that the way the school is run can be usefully compared to the “average” school.
Great post.
My only comment is that I think you’re confused in section iv when you say, “but the origin of the universe is essentially an infinity of inferential steps away given the sheer scale of the issue,” and think that you’re misunderstanding some tricky and subtle points about epistemology of science and what inferential steps would be needed. So people might be right when they say you meant “We can’t make any meaningful factual claims about the origin of the universe. We are too limited to understand an event like this.”—but the object level claim is wrong. In fact, we can make empirical predictions that get falsified by evidence in Cosmology, and the predictions about the big bang, the cosmic microwave background, and uniformity did exactly that.
That’s fair—thanks for checking, and I’d agree that that would better match “very nice middle-class area” than my assertion. (In the US, the top 2-3% is usually considered upper class, while the next 15-20% are upper middle class, and the next ~25% are “lower middle class.” This income level definitely puts your neighborhood in the middle of the upper middle class.)
I’d agree with most of your models, and agree that there is divergence at the extremes of a distribution—but that’s at the very extremes, and usually doesn’t lead to strong anti-correlation even in the extreme tails.
But I think we’re better off being more concrete. I don’t know where you live, but I suspect that your postal code is around the 90% income percentile, after housing costs—a prediction which you can check easily. And that implies that the tails for income and education are still pretty well correlated at only the 97th percentile for education—and implying the same about status more generally. (Or perhaps you think the people who attend the school are significantly less rich than the average in the area?)
Even given your numbers, I think it’s very likely that you’re underestimating how privileged the group is. Most things like educational status are pareto-distributed; 80% of PhDs are in 20% of areas. While that assumption may be unfair, if it were correct, the point with 3x the average is in the 97th percentile.
And yes, you’re near Cambridge, which explains the concentration of PhDs, and makes it seem less elite compared to Cambridge itself, but doesn’t change the class of the people compared to the country as a whole.
Note that only around 3% of UK residents have PhDs—so I strongly suspect that what you’re calling “middle-class” is closer to the top 5% of the population, or what sociologists would say is the very upper part of the upper middle class.
Yes, it’s super important to update frequently when the scores are computed as time-weighted. And for Mataculus, that’s a useful thing, since viewers want to know what the current best guess is, but it’s not the only way to do scoring. But saying frequent updating makes you better at forecasting isn’t actually a fact about how accurate the individual forecasts are—it’s a fact about how they are scored.
“Immunity” and “efficacy” seem like they should refer to the same thing, but they really don’t. And if you talk to people at the FDA, or CDC, they should, and probably would, talk about efficacy, not immunity, when talking about these vaccines.
And I understand that the technical terms and usage aren’t the same as what people understand, and I was trying to point out that for technical usage, the terms don’t quite mean the things you were assuming.
And yes, the vaccines have not been proven to provide immunizing protection—which again, is different than efficacy. (But the vaccines do almost certainly provide immunizing protection for some people, just based on the obvious prior information and the current data—though it’s unclear how well they do so, at how long after the vaccine.)
And, to make things worse, even efficacy is unclearly defined. It gets defined in each clinical trial - differently for each drug/vaccine/etc. and I don’t think it actually mean the same thing for the currently approved COVID-19 vaccines. It’s pretty similar, stopping symptomatic cases, but even given the same endpoint, it’s not necessarily identical, since the studies picked how to measure the endpoints independently, and differently.
There was a lesswrong post about this a while back that I can’t find right now, and I wrote a twitter thread on a related topic. I’m not involved with the reasoning behind the structure for GJP or Metaculus, so for both it’s an outside perspective. However, I was recently told there is a significant amount of ongoing internal metaculus discussion about the scoring rule, which, I think, isn’t nearly as bad as it seemed. (But even if there is a better solution, changing the rule now would have really weird impacts on motivation of current users, which is critical to the overall forecast accuracy, and I’m not sure it’s worthwhile for them.)
Given all of that, I’d be happy to chat, or even do a meetup on incentives for metrics and issues generally, but I’m not sure I have time to put together my thoughts more clearly in the next month. But I’d think Ozzie Gooen has even more to usefully say on the topic. (Thinking about it, I’d be really interested in being on or watching a panel discussion of the topic—which would probably make an interesting event.)
If the user is interested in getting into the top ranks, this strategy won’t be anything like enough. And if not, but they want to maximize their score, the scoring system is still incentive compatible—they are better off reporting their true estimate on any given question. And for the worst (but still self-aware) predictors, this should be the metaculus prediction anyways—so they can still come away with a positive number of points, but not many. Anything much worse than that, yes, people could have negative overall scores—which, if they’ve predicted on a decent number of questions, is pretty strong evidence that they really suck at forecasting.
Not really. Overall usefulness is really about something like covariance with the overall prediction—are you contributing different ideas and models. That would be very hard to measure, while making the points incentive compatible is not nearly as hard to do.
And how well an individual predictor will do, based on historical evidence, is found in comparing their brier to the metaculus prediction on the same set of questions. This is information which users can see on their own page. But it’s not a useful figure unless you’re asking about relative performance, which as an outsider interpreting predictions, you shouldn’t care about—because you want the aggregated prediction.
I agree that actually offering money would require incentives to avoid, essentially, sybil attacks. But making sure people don’t make “noise predictions” isn’t a useful goal—those noise predictions don’t really affect the overall metaculus prediction much, since it weights past accuracy.
As someone who is involved in both Metaculus and the Good judgement project, I think it’s worth noting that Zvi’s criticism of Metaculus—that points are given just for participating, so that making a community average guess gets you points—applies to Good Judgement Inc’s predictions by superforecasters in almost exactly the same way—the superforecasters are paid for a combination of participation and their performance, so that guessing the forecast median earns them money. (GJI does have a payment system for superforecasters which is more complex than this, and I probably am not allowed to talk about—but the central point remains true.)
I think that viewing it as a competition to place highly on the leaderboards is misleading, and perhaps even damaging.
I’d think the better framing for metaculus points is that they are like money—you are being paid to predict, on net, and getting more money is better. The fact that the leaderboard has someone with a billion points, because they have been participating for years, is kind-of irrelevant, and misleading.
In fact, I’d like to see metaculus points actually be convertible to money at some point in some form—and yes, this would require a net cost (in dollars) to post a new question, and have the pot of money divided proportionate to the total points gained on the question—with negative points coming out of a users’ balance. (And this would do a far better job aligning incentives on questions than the current leaderboard system, since for a leaderboard system, proper scoring rules for points are not actually incentive compatible.)
Agree that this is evidence it is doing something, but my strong prior is that the adjuvant alone (chitosan) would cause this to happen.
I’m also unclear about why you chose the weekly schedule, or if you waited long enough to see any impact. (Not that the RadVac test would tell you anything.) The white paper suggests *at least* one week between doses, and suggests taking 3 doses, for healthy young adults.
According to the white paper, you’re likely to be protected, and I think continuing now would add danger without corresponding benefit. You said in the original post that you might continue dosing. I don’t know enough to comment usefully about either immune tolerance or adjuvant hyperstimulation, but I suggest talking to a immunologist about those risks and how they change if you in fact continue and try “more dakka,” since continuing to does seems like it would increase those risks.
Strongly agree that ELISA tests are more valuable than more RadVac, and it would be at least moderate evidence one way or another. (But even if you can induce immune reactions to parts of the virus, it’ unclear how much that would actually reduce your risk if infected.)