BSL isn’t the thing that defines “appropriate units of risk”, that’s pathogen risk-group levels, and I agree that those are are problem because they focus on pathogen lists rather than actual risks. I actually think BSL are good at what they do, and the problem is regulation and oversight, which is patchy, as well as transparency, of which there is far too little. But those are issues with oversight, not with the types of biosecurity measure that are available.
Davidmanheim
If you’re appealing to OpenPhil, it might be useful to ask one of the people who was working with them on this as well.
And you’ve now equivocated between “they’ve induced an EA cause area” and a list of the range of risks covered by biosecurity—not what their primary concerns are—and citing this as “one of them.” I certainly agree that biosecurity levels are one of the things biosecurity is about, and that “the possibility of accidental deployment of biological agents” is a key issue, but that’s incredibly far removed from the original claim that the failure of BSL levels induced the cause area!
I mean, I’m sure something more restrictive is possible.
But what? Should we insist that the entire time someone’s inside a BSL-4 lab, we have a second person who is an expert in biosafety visually monitoring them to ensure they don’t make mistakes? Or should their air supply not use filters and completely safe PAPRs, and feed them outside air though a tube that restricts their ability to move around instead?
Or do you have some new idea that isn’t just a ban with more words?
“lists of restrictions” are a poor way of managing risk when the attack surface is enormous
Sure, list-based approaches are insufficient, but they have relatively little to do with biosafety levels of labs, they have to do with risk groups, which are distinct, but often conflated. (So Ebola or Smallpox isn’t a “BSL-4” pathogen, because there is no such thing. )
I just meant “gain of function” in the standard, common-use sense—e.g., that used in the 2014 ban on federal funding for such research.
That ban didn’t go far enough, since it only applied to 3 pathogen types, and wouldn’t have banned what Wuhan was doing with novel viruses, since that wasn’t working with SARS or MERS, it was working with other species of virus. So sure, we could enforce a broader version of that ban, but getting a good definition that’s both extensive enough to prevent dangerous work and that doesn’t ban obviously useful research is very hard.
Having written extensively about it, I promise you I’m aware. But please, tell me more about how this supports the original claim which I have been disagreeing with, that these class of incidents were or are the primary concern of the EA biosecurity community, the one that led to it being a cause area.
The OP claimed a failure of BSL levels was the single thing that induced biorisk as a cause area, and I said that was a confused claim. Feel free to find someone who disagrees with me here, but the proximate causes of EAs worrying about biorisk has nothing to do with BSL lab designations. It’s not BSL levels that failed in allowing things like the soviet bioweapons program, or led to the underfunded and largely unenforceable BWC, or the way that newer technologies are reducing the barriers to terrorists and other being able to pursue bioweapons.
I did not say that they didn’t want to ban things, I explicitly said “whether to allow certain classes of research at all,” and when I said “happy to rely on those levels, I meant that the idea that we should have “BSL-5” is the kind of silly thing that novice EAs propose that doesn’t make sense because there literally isn’t something significantly more restrictive other than just banning it.
I also think that “nearly all EA’s focused on biorisk think gain of function research should be banned” is obviously underspecified, and wrong because of the details. Yes, we all think that there is a class of work that should be banned, but tons of work that would be called gain of function isn’t in that class.
BSL levels, which have failed so consistently and catastrophically they’ve induced an EA cause area,
This is confused and wrong, in my view. The EA cause area around biorisk is mostly happy to rely on those levels, and unlike for AI, the (very useful) levels predate EA interest and give us something to build on. The questions are largely instead about whether to allow certain classes of research at all, the risks of those who intentionally do things that are forbiddn, and how new technology changes the risk.
and then the 2nd AI pays some trivial amount to the 1st for the inconvenience
Completely as an aside, coordination problems among ASI don’t go away, so this is a highly non trivial claim.
I thought that the point was that either managed-interface-only access, or API access with rate limits, monitoring, and an appropriate terms of service, can prevent use of some forms of scaffolding. If it’s staged release, this makes sense to do, at least for a brief period while confirming that there are not security or safety issues.
These days it’s rare for a release to advance the frontier substantially.
This seems to be one crux. Sure, there’s no need for staged release if the model doesn’t actually do much more than previous models, and doesn’t have unpatched vulnerabilities of types that would be identified by somewhat broader testing.
The other crux, I think, is around public release of model weights. (Often referred to, incorrectly, as “open sourcing.”) Staged release implies not releasing weights immediately—and I think this is one of the critical issues with what companies like X have done that make it important to demand staged release for any models claiming to be as powerful or more powerful than current frontier models. (In addition to testing and red-teaming, which they also don’t do.)
It is funny, but it also showed up on April 2nd in Europe and anywhere farther east...
I think there are two very different cases of “almost works” that are being referred to. The first is where the added effort is going in the right direction, and the second is where it is slightly wrong. For the first case, if you have a drug that doesn’t quite treat your symptoms, it might be because it addresses all of them somewhat, in which case increasing the dose might make sense. For the second case, you could have one that addresses most of the symptoms very well, but makes one worse, or has an unacceptable side effect, in which case increasing the dose wouldn’t help. Similarly, we could imagine a muscle that is uncomfortable. The second case might then be a stretch that targets almost the right muscle. That isn’t going to help if you do it more. The first case, on the other hand, would be a stretch that targets the right muscle but isn’t doing enough, and obviously it could be great to do more often, or for a longer time.
A Dozen Ways to Get More Dakka
Again, I think it was a fine and enjoyable post.
But I didn’t see where you “demonstrate how I used very basic rationalist tools to uncover lies,” which could have improved the post, and I don’t think this really explored any underappreciated parts of “deception and how it can manifest in the real world”—which I agree is underappreciated. Unfortunately, this post didn’t provide much clarity about how to find it, or how to think about it. So again, it’s a fine post, good stories, and I agree they illustrate being more confused by fiction than reality, and other rationalist virtues, but as I said, it was not “the type of post that leads people to a more nuanced or better view of any of the things discussed.”
I disagree with this decision, not because I think it was a bad post, but because it doesn’t seem like the type of post that leads people to a more nuanced or better view of any of the things discussed, much less a post that provided insight or better understanding of critical things in the broader world. It was enjoyable, but not what I’d like to see more of on Less Wrong.
(Note: I posted this response primarily because I saw that lots of others also disagreed with this, and think it’s worth having on the record why at least one of us did so.)
“Climate change is seen as a bit less of a significant problem”
That seems shockingly unlikely (5%) - even if we have essentially eliminated all net emissions (10%), we will still be seeing continued warming (99%) unless we have widely embraced geoengineering (10%). If we have, it is a source of significant geopolitical contention (75%) due to uneven impacts (50%) and pressure from environmental groups (90%) worried that it is promoting continued emissions and / or causes other harms. Progress on carbon capture is starting to pay off (70%) but is not (90%) deployed at anything like the scale needed to stop or reverse warming.
Adaptation to climate change has continued (99%), but it is increasingly obvious how expensive it is and how badly it is impacting developing world. The public still seems to think this is the fault of current emissions (70%) and carbon taxes or similar legal limits are in place for a majority of G7 countries (50%) but less than half of other countries (70%).
To start, the claim that it was found 2 miles from the facility is an important mistake, because WIV is 8 miles from the market. For comparison to another city people might know better, in New York, that’s the distance between World Trade Center and either Columbia University, or Newark Airport. Wuhan’s downtown is around 16 miles across. 8 miles away just means it was in the same city.
And you’re over-reliant on the evidence you want to pay attention to. For example, even rstricting ourselves to “nearby coincidence” evidence, the Hunan the market is the largest in central China—so what are the odds that a natural spillover events occurs immediately surrounding the largest animal market? If the disease actually emerged from WIV, what are the odds that the cases centered around the Hunan market, 8 miles away, instead of the Baishazhou live animal market, 3 miles away, or the Dijiao market, also 8 miles away?
So I agree that an update can be that strong, but this one simply isn’t.
Yeah, but I think that it’s more than not taken literally, it’s that the exercise is fundamentally flawed when being used as an argument instead of very narrowly for honest truth-seeking, which is almost never possible in a discussion without unreasonably high levels of trust and confidence in others’ epistemic reliability.
What is the relevance of the “posterior” that you get after updating on a single claim that’s being chosen, post-hoc, as the one that you want to use as an example?
Using a weak prior biases towards thinking the information you have to update with is strong evidence. How did you decide on that particular prior? You should presumably have some reference class for your prior. (If you can’t do that, you should at least have equipoise between all reasonable hypotheses being considered. Instead, you’re updating “Yes Lableak” versus “No Lableak”—but in fact, “from a Bayesian perspective, you need an amount of evidence roughly equivalent to the complexity of the hypothesis just to locate the hypothesis in theory-space. It’s not a question of justifying anything to anyone.”)
How confident are you in your estimate of the bayes factor here? Do you have calibration data for roughly similar estimates you have made? Should you be adjusting for less than perfect confidence?
That doesn’t seem like “consistently and catastrophically,” it seems like “far too often, but with thankfully fairly limited local consequences.”