To start, the claim that it was found 2 miles from the facility is an important mistake, because WIV is 8 miles from the market. For comparison to another city people might know better, in New York, that’s the distance between World Trade Center and either Columbia University, or Newark Airport. Wuhan’s downtown is around 16 miles across. 8 miles away just means it was in the same city. And you’re over-reliant on the evidence you want to pay attention to. For example, even rstricting ourselves to “nearby coincidence” evidence, the Hunan the market is the largest in central China—so what are the odds that a natural spillover events occurs immediately surrounding the largest animal market? If the disease actually emerged from WIV, what are the odds that the cases centered around the Hunan market, 8 miles away, instead of the Baishazhou live animal market, 3 miles away, or the Dijiao market, also 8 miles away?So I agree that an update can be that strong, but this one simply isn’t.
Yeah, but I think that it’s more than not taken literally, it’s that the exercise is fundamentally flawed when being used as an argument instead of very narrowly for honest truth-seeking, which is almost never possible in a discussion without unreasonably high levels of trust and confidence in others’ epistemic reliability.
What is the relevance of the “posterior” that you get after updating on a single claim that’s being chosen, post-hoc, as the one that you want to use as an example?
Using a weak prior biases towards thinking the information you have to update with is strong evidence. How did you decide on that particular prior? You should presumably have some reference class for your prior. (If you can’t do that, you should at least have equipoise between all reasonable hypotheses being considered. Instead, you’re updating “Yes Lableak” versus “No Lableak”—but in fact, “from a Bayesian perspective, you need an amount of evidence roughly equivalent to the complexity of the hypothesis just to locate the hypothesis in theory-space. It’s not a question of justifying anything to anyone.”)
How confident are you in your estimate of the bayes factor here? Do you have calibration data for roughly similar estimates you have made? Should you be adjusting for less than perfect confidence?
Thank you for writing this.I think most points here are good points to make, but I also think it’s useful as a general caution against this type of exercise being used as an argument at all! So I’d obviously caution against anyone taking your response itself as a reasonable attempt at an estimate of the “correct” Bayes factors, because this is all very bad epistemic practice! Public explanations and arguments are social claims, and usually contain heavily filtered evidence (even if unconsciously). Don’t do this in public.That is, this type of informal Bayesian estimate is useful as part of a ritual for changing your own mind, when done carefully. That requires a significant degree of self-composure, a willingness to change one’s mind, and a high degree of justified confidence n your own mastery of unbiased reasoning.
Here, though, it is presented as an argument, which is not how any of this should work. And in this case, it was written by someone who already had a strong view of what the outcome should be, repeated publicly frequently, which makes it doubly hard to accept the implicit necessary claim that it was performed starting from an unbiased point at face value! At the very least, we need strong evidence that it was not an exercise in motivated reasoning, that the bottom line wasn’t written before the evaluation started—which statement is completely missing, though to be fair, it would be unbelievable if it had been stated.
I agree that releasing model weights is “partially open sourcing”—in much the same way that freeware is “partially open sourcing” software, or restrictive licences with code availability is.But that’s exactly the point; you don’t get to call something X because it’s kind-of-like X, it needs to actually fulfill the requirements in order to get the label. What is being called Open Source AI doesn’t actually do the thing that it needs to.
Thanks—I agree that this discusses the licenses, which would be enough to make LlaMa not qualify, but I think there’s a strong claim I put forward in the full linked piece that even if the model weights were released using a GPL license, those “open” model weights wouldn’t make it open in the sense that Open Source means elsewhere.
I agree that the reasons someone wants the dataset generally aren’t the same reasons they’d want to compile from source code. But there’s a lot of utility for research in having access to the dataset even if you don’t recompile. Checking whether there was test-set leakage for metrics, for example, or assessing how much of LLM ability is stochastic parroting of specific passages versus recombination. And if it was actually open, these would not be hidden from researchers.
And supply chain is a reasonable analogy—but many open-source advocates make sure that their code doesn’t depend on closed / proprietary libraries. It’s not actually “libre” if you need to have a closed source component or pay someone to make the thing work. Some advocates, those who built or control quite a lot of the total open source ecosystem, also put effort into ensuring that the entire toolchain needed to compile their code is open, because replicability shouldn’t be contingent on companies that can restrict usage or hide things in the code. It’s not strictly required, but it’s certainly relevant.
The vast majority of uses of software are via changing configuration and inputs, not modifying code and recompiling the software. (Though lots of Software as a Service doesn’t even let you change configuration directly.) But software is not open in this sense unless you can recompile, because it’s not actually giving you full access to what was used to build it.The same is the case for what Facebook call open-source LLMs; it’s not actually giving you full access to what was used to build it.
Thanks—Redpajama definitely looks like it fits the bill, but it shouldn’t need to bill itself as making “fully-open, reproducible models,” since that’s what “open source” is already supposed to mean. (Unfortunately, the largest model they have is 7B.)
Yes, agreed—as I said in the post, “Open Source AI simply means that the models have the model weights released—the equivalent of software which makes the compiled code available. (This is otherwise known as software.)”
“Freely remixable” models don’t generally have open datasets used for training. If you know of one, that’s great, and would be closer to open source. (Not Mistral. And Phi-2 is using synthetic data from other LLMs—I don’t know what they released about the methods used to generate or select the text, but it’s not open.)
But the entire point is that weights are not the source code for an LLM, they are the compiled program. Yes, it’s modifiable via LoRA and similar, but that’s not open source! Open source would mean I could replicate it, from the ground up. For facebook’s models, at least, the details of the training methods, the RLHF training they do, where they get the data, all of those things are secrets. But they call it “Open Source AI” anyways.
Good point, and I agree that it’s possible that what I see as essential features might go away—“floppy disks” turned out to be a bad name when they ended up inside hard plastic covers, and “deepware” could end up the same—but I am skeptical that it will.I agree that early electronics were buggy until we learned to build them reliably—and perhaps we can solve this for gradient-descent based learning, though many are skeptical of that, since many of the problems have been shown to be pretty fundamental. I also agree that any system is inscrutable until you understand it, but unlike early electronics, no-one understands these massive lists of numbers that produce text, and human brains can’t build them, they just program a process to grow them. (Yes, composable NNs could solve some of this, as you point out when mentioning separable systems, but I still predict they won’t be well understood, because the components individually are still deepware.)
You talk about “governance by Friendly AGI” as if it’s a solved problem we’re just waiting to deploy, not speculation that might simply not be feasible even if we solve AGI alignment, which itself is plausibly unsolvable in the near term. You also conflate AI safety research with AI governance regimes. And note that the problems with governance generally aren’t a lack of intelligence by those in charge, it’s largely conflicting values and requirements. And with that said, you talk about modern liberal governments as if they are the worst thing we’ve experienced, “riddled with brokenness,” as if that’s the fault of the people in charge, not the deeply conflicting mandates that the populace gives them. And to the extent that the systemic failure is the fault of the untrustworthy incentives of those in charge, why would controllable or aligned AGI fix that?
Yes, stasis isn’t safe by default, but undirected progress isn’t a panacea, and governance certainly isn’t any closer to solved just because we have AI progress.
Thanks. I was unaware of the law, and yes, that does seem to be strong evidence that the agencies in question don’t have any evidence specific enough to come to any conclusion. That, or they are foolishly risking pissing off Congress, which can subpoena them, and seems happy to do exactly that in other situations—and they would do so knowing that it’s eventually going to come out that they withheld evidence?!?
Again, it’s winter, people get sick, that’s very weak Bayesian evidence of an outbreak, at best. On priors, how many people at an institute that size get influenza every month during the winter?
And the fact that it was only 3 people, months earlier, seems to indicate moderately strongly it wasn’t the source of the full COVID-19 outbreak, since if it were, given the lack of precautions against spread at the time, if it already infected 3 different people, it seems likely it would have spread more widely within China starting at that time.
Sorry, I’m having trouble following. You’re saying that 1) it’s unlikely to be a lab leak known to US Intel because it would have been known to us via leaks, and 2) you think that Intel agencies have evidence about WIV employees having COVID and that it’s being withheld?
First, I think you’re overestimating both how much information from highly sensitive sources would leak, and how much Chinese leaders would know if it were a lab leak. This seems on net to be mostly uninformative.
Second, if they have evidence about WIV members having COVID, (and not, you know, any other respiratory disease in the middle of flu/cold season,) I still don’t know why you think you would know that it was withheld from congress. Intel agencies share classified information with certain members of Congress routinely, but you’d never know what was or was not said. You think a lack of a leak is evidence that would have been illegally withheld from congress—but it’s not illegal for Intel agencies to keep information secret, in a wide variety of cases.
And on that second point, even without the above arguments, not having seen such evidence publicly leaked can’t plausibly be more likely in a world where it was a lab leak that was hidden, than it would be in a world where it wasn’t a lab leak and the evidence you’re not seeing simply doesn’t exist!
State department isn’t part of “US intelligence agencies and military,” and faces very, very different pressures. And despite this, as you point out there are limits to internal pressures in intel agencies—which at least makes it clear that the intel agencies don’t have strong and convincing non-public evidence for the leak hypothesis.
I’m not saying it’s impossible, I’m saying it’s implausible. (So if this is a necessary precondition for believing in a lab leak, it is clear evidence against it.)
“(and likely parts of the US intelligence agencies and military) desperately wanted this to not be a lab leak.”
As I said in another comment, that seems very, very hard to continue to believe, even if it might have seemed plausible on priors.