I like the policy of voting on presence of strengths rather than absence of weaknesses, but disagree here because, as I said in my review, it’s valuable and in part pretty clearly correct, but even though ” it’s not overall completely off base… it does seem to go in the wrong direction, or at least fail to embody the virtues of rationality I think the best work on Lesswrong is suppose to uphold.”
(That said, I think this question is a more general Duncan-Sabien vs. Current Lesswrong policy question, as your reply about why you disagree makes clear—and I’m mostly on Duncan’s side about what standards we should have, or at least aspire to.)
Davidmanheim
Having read the post, and debates in the comments, and Vanessa Kosoy’s review, I think this post is valuable and important, even though I agree that there are significant weaknesses various places, certainly with respect to the counting arguments and the measure of possible minds—as I wrote about here in intentionally much simpler terms than Vanessa has done.
The reason I think it is valuable is because weaknesses in one part of their specific counterargument do not obviate the variety of valid and important points in the post, though I’d be far happier if there was something in between “include this” and “omit this” for the 2024-in-review series—because a partial rewrite or a note about the disputed claims would entirely address my concern with including it.
I have very mixed views about this, as someone who is myself religious. First, I think it’s obviously the case that in many instances religion is helpful for individuals, and even helps their rationality. The tools and approaches developed by religion are certainly valuable, and should be considered and judiciously adopted by anyone who is interested in rationality. This seems obvious once pointed out, and if that was all the post did, I would agree. (There’s an atheist-purity mindset issue here where people don’t want to admit “The Worst Person You Know Just Made a Great Point” and that’s a related issue.)
But the argument here is far stronger—not that the tools work, or that some people benefit, but “that these traditions, whose areas of convergence could together be referred to as the perennial philosophy, are trustworthy.” And that seems to be going too far, failing to have the critical crisis of faith. And the next post in the series shows why—it takes far too many claims at face value to reach a convenient conclusion. So it’s not overall completely off base, but it does seem to go in the wrong direction, or at least fail to embody the virtues of rationality I think the best work on Lesswrong is suppose to uphold.
I think the distinction is between “smarter and more capable than any human” versus “smarter and more capable than humanity as a whole”
The former is what you refer to, which could still be “Careful Moderate Superintelligence” in the view of the post.
there’s an extremely strong selection effect at labs for an extreme degree of positivity and optimism regardless of whether it is warranted.
Absolutely agree with this—and that’s a large part of why I think it’s incredibly noteworthy that despite that bias, there are tons of very well informed people at the labs, including Boaz, who are deeply concerned that things could go poorly, and many don’t think it’s implausible that AI could destroy humanity.
Now that there are additional posts, I’d love to hear if you still have this objection.
“Actual LessWrong readers also sometimes ask me how I deal emotionally with the end of the world.
I suspect a more precise answer may not help. But Raymond Arnold thinks I should say it, so I will say it.
I say again, I don’t actually think my answer is going to help.”
It’s not a common trope, certainly, but if it is one, it’s also one that Eliezer is happy to play out. (And there are lots of good tropes that people play out which they shouldn’t avoid just because they are tropes—like falling in love, or being a good friend to others when they are sad, or being a conscientious ethical objector, or being someone who can let go of things while having fun, etc.)
Agree that it’s not just about being dramatic / making the problem about you. But that was only one of the points Eliezer made about why people could fail at this in ways that are worth trying to fix. And in your case, yes, dealing with the excessive anxiety seems helpful.
Good question, good overview!
Minor note on the last point, which seems like a good idea, but human oversight failures take a number of forms. The proposed type of red-teaming probably catches a lot of them, but will focus on easy to operationalize / expected failure modes, and ignores the institutional incentives that will make oversight fail even when it could succeed, including unwillingness to respond due to liability concerns, slow response to correctly identified failures. (See our paper and poster at AIGOV 2026 at AAAI.)
Is this anxiety in the typical form of making it harder for you to do other things? Because yes, we all agree that it’s very bad outcome, but a critical point of the post is that you might want to consider ways to not do the thing that makes your life worse and doesn’t help.
In retrospect, the post holds up well—it’s not a brilliant insight, but I’ve referred back to it, and per the comments, so have at least some others.
I would love for there to be more attention to practical rationality techniques and useful strategies, not just on (critically important) object-level concerns, and hope that more work in that direction is encouraged.
Designing funding institutions that scale to handle 10x to 100x the number dollars, and also the number of “principals” (since I expect, as opposed to OP having a single Dustin, Anthropic will produce something like 50-100 folks with 10Ms-100Ms to donate)
Seems plausible that a decent part of Coefficient Giving’s new strategy exactly supports this model.
I’d be especially interested in Angel investors funding early stage EA-aligned high risk moonshots that will need Series A funding in a year if successful—but it likely requires risk neutral, low regret funders, or people funding an entire portfolio, both of which are rare.
I agree that many of the worldviews being promoted are unrealistic—expecting companies in the current competitive race conditions would be a competitive disadvantage.
But I also think that there are worlds where Anthropic or OpenAI as companies cared enough to ensure that they can be trusted to keep their promises. And there are industries (financial auditing, many safety critical industries,) where this is already the case—where companies know that their reputation as careful and honest actors is critical to their success. In those industries, breaking the trust is a quick path to bankruptcy.
Clearly, the need for anything like that type of trustworthiness is not true in the AI industry. Moreover, coordinating a change in the status quo might be infeasible. So again, yes, this is an unrealistic standard.
However, I would argue that high-trust another viable equilibrium, one where key firms were viewed as trustworthy enough that anyone using less-trustworthy competitors would be seen as deeply irresponsible. Instead, we have a world stuck in the low-trust competition in AI, a world where everyone agrees that uploading sensitive material to an LLM is a breach of trust, and uploading patient information is a breach of confidentiality. The only reason to trust the firms is that they likely won’t care or check, and certainly not that they can be trusted not to do so. And they are right to say that the firms have not made themselves trustworthy enough for such uses—and that is part of the reason the firms are not trying to rigorously prove themselves trustworthy.
And if AI is going to control the future, as seems increasingly likely, I’m very frustrated that attempts to move towards actually being able to trust AI companies are, as you said, “based on unrealistic and naive world views.”
Regardless of whether you think the company is net positive, or working for it is valuable, are you willing to explicitly disagree with the claim that as an entity, the company cannot be trusted to reliably fulfill all the safety and political claims which it makes, or has made? (Not as in inviolably never doing anything different despite changes, but in the same sense that you trust a person not to break a promise without. e.g., explaining to those it was made to about why it thinks the original promise isn’t binding, or why the specific action isn’t breaking their trust.)
I think that an explicit answer to this question would be more valuable than the reasonable caveats given.
I agree that treating corporations or governments or countries as single coherent individuals is a type error, since it’s important to be able to decompose them into factions and actors to build a good gears-level model that is predictive, and you can easily miss that. I strongly disagree that treating them as actors which can be trusted or distrusted is a type error. You seem to be making the second claim, and I don’t understand it; the company makes decisions, and you can either trust it to do what it says, or not—and this post says the latter is the better model for anthropic.
Of course, the fact that you can’t trust a given democracy to keep its promises doesn’t mean you can’t trust any of the individuals in it, and the fact that you can’t trust a given corporation doesn’t necessarily mean that about the individuals working for the company either. (It doesn’t even mean you can’t trust each of the individual people in charge—clearly, trust isn’t necessarily conserved over most forms of preference or decision aggregation.)
But as stated, the claims made seem reasonable, and in my view, the cited evidence shows it’s basically correct, about the company as an entity and its trustworthiness.
He implied there that in the short term the advantage will be asymmetric, even if he’s hopeful that there will eventually be a defensive advantage. (I’m agnostic on the latter, and even if he’s right, I think the times scale needed for it to emerge might be longer than it will matter.) But I should have linked to his recent piece, where he says this explicity, not that older one: https://www.schneier.com/crypto-gram/archives/2025/1015.html#cg18
I’ll fix that now.
You said, “I vote on posts for presence of strengths more than for absence of weaknesses.” I agree the post has strengths, but you agree that the problems are there as well; given the failings, I disagree with the claim that this contribution is net positive.