not a lot of people (maybe literally 0) have had sufficient reason to saw off their own hand for altruistic reasons. i’ve donated a kidney, donate blood often, and gave more than the GWWC pledge when my income was high. any falsifiable claims you’d like to check while we’re speculating about my values?
casens
i agree that i would rather die instantly than live for 100 years of torture. i don’t think that proves as much as you think. i also think it’s fine for some people to make morbid utility calculations like these, and for others to say “i don’t want to think about that and i’m not going to answer”
not OP, but that seems like a pretty reasonable conclusion. if i had to sacrifice my own life to save every person i didn’t personally know (ie. 8.1 billion people), i would absolutely do it in a heartbeat. i would also do it to just save a fraction of those people (8M people). once it starts getting down to much smaller fractions (saving 100-3 random people) does it start seeming like a hard tradeoff.
are you sure this isn’t just the evolution of your own information diet and circle of friends? do you think if you asked a random american “do you know who nate silver is? → do you think he got it mostly right or mostly wrong in the last few elections?” do you think they’d say “he was mostly right” or “nate silver is always wrong because he’s [too woke]/[not woke enough]”?
prediction markets are allegedly a way to bring empiricism to fields that had none before, and your best defense of them is “the vibes feel better now”
sure, but it’s just weird? about as weird as commenting on the post saying “i actually think AI is super overblown and we’re going to be ok”, and equally weird that it would be widely upvoted. my model of rat culture is that “Preventing extinction from ASI on a $50M yearly budget” is actually slightly more of a clickbait title than “$50 million a year for a 10% chance to ban ASI”. and if you’re saying “it’s dumb to use made up probabilities to express your confidence”, then that’s a debate that has been had for decades on this forum, at this point.
The only argument for the headline claim “$50 million a year for a 10% chance to ban ASI” is a footnote that says “The probabilities are produced mostly by gut feeling.”
rationalists use gut-level probability forecasts as a way to specify confidence all the time. that’s like the number 2 shared characteristic of rat culture, behind “believing that ASI will kill everyone”
and the DoW / Anthropic standoff started brewing then. how much did the DoW know about mythos at that time? was the demand “if you have a model this powerful, then the US military must have unrestricted access to it”?
when the DoW / anthropic / OpenAI standoff was going down, how much did the parties know about claude mythos / glasswing? the standoff seemed so weird on a few levels: 1) how could this simple agreement/communication about “all legal use” or “domestic surveillance” or whatever result in such staggering consequences (DoW threatening to destroy anthropic)? 2) why the sudden extreme, public blowup? 3) why would the DoW resort to such extreme tactics (rather than, as was pointed out repeatedly, just having normal buisness negotiations and changing contractors)?
instead, if it was something like “you said you have a model that can exploit every OS. we need it” and anthropic said “it’s not ready / not safe / not ethical”… that explains a whole lot. but it would also imply that OpenAI said to DoW “we also have a model that’s good enough at X, Y, and Z”.
though for what this explains, it also seems unlikely that one of the most high-pressure, heavily scrutinized news events recently didn’t have anyone leaking about this for a whole month. and weird things about the drama are also explained by regular human personalities, stupidity and luck.
I’m warning the wise would-be investors, including governments and the public.
you’re ascribing too much consequentialism to people who are generally not consequentialists. the mindset is more like:
AI is personally annoying to me and is obviously a scam/illusion. the only reason people are excited about it is because there are scammers flooding the zone with hype. me and my friends can’t do anything about it, but it’s at least gratifying to complain about in solidarity; the world may be insane, but i may as well point out the insanity. the wheel keeps turning and in a few years some other scam will come along.
forecasting is so intractible-in-practice that metaculus doesn’t even use it for internal decisions, despite attempting to do so multiple times (source: i was there)
XBOW’s own “Top 1” announcement is dated June 24, 2025--about 8 months before this LessWrong post (Feb 19, 2026), not “almost one year.”
this is borderline pedantry, and also the timing of the announcement is not significant to the thesis of the post. if your goal is to tell us “here’s what the extension is like, also be aware that some of the corrections are wrong/unhelpful” then fine, but if it’s a sales pitch you should choose a better example.
this is a neat article and all, but what are the actual sources for any of this? what is the author’s relation to this story? they slide between statistics and checkable facts smoothly into anecdata of “this is the general sort of thing that happens in places like these”
But there’s an additional problem here. People overestimate foreign aid, and these polling methods exaggerate how much they overestimate. In the KFF data, the majority of respondents said it was below 20%, and the largest decile by far was 0-10%. KFF said that average was 26%, but their own data show that this isn’t what most people actually believe; it’s an artifact of using the wrong statistic.
the KFF data shows that the median respondent thinks foreign aid is somewhere in the 11-20% of the budget, so reporting the average is roughly 2.4 to 1.3 times an overestimate of the median belief. but reporting the average is what most people do, because most people are not very numerically literate, which is what the poll itself shows!
you’re missing the point by splitting hairs about whether the geometric average is more useful than the arithmetic mean, because the poll really does show that the public overestimates the foreign aid bugdet. the bigger issue at play is that opinion polling the public produces crazy results, like [the lizardman constant](https://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim-climatologists-from-mars/). or i would guess that people are not thinking that hard, and that if you offered them $10 if their answer was close to the real result, the median would become more accurate.
Jim, Garfield’s owner,
Garfield’s owner’s name is Jon Arbuckle. the cartoonist/author of Garfield is Jim Davis
i notice the OP didn’t actually mention examples of legible or illegible alignment problems. saying “leaders would be unlikely to deploy an unaligned AGI if they saw it had legible problem X” sounds a lot like saying “we would never let AGI onto the open internet, we can just keep it in a box”, in the era before we deployed sydney soon as it caught the twinkle of a CEO’s eye.
your entire analysis is broken in that you assume that an elo rating is something objective like an atomic weight or the speed of light. in reality, an elo rating is an estimation of playing strength among a particular pool of players.
the problem that elo was trying to solve was, if you have players A and B, who have both played among players C through Q, but A and B have never played each other, can you concretely say whether A is stronger than B? the genius of the system is that you can, and in fact, the comparison of 2 scores gives you a probability of whether A will beat B in a game (if i recall correctly, a difference of +200 points implies an expected score of +0.75, where 1.0 is winning, 0 is losing, and 0.5 is a draw).
the elo system does not work, however, if there are 2 pools of non-overlapping players like C through M and N through Z, and A has only played in pool 1, and B only in pool 2. i’m fairly certain you could construct a series ~200 of exploitable chess bots, where A always beats B, B always beats C, etc, getting elo rankings almost arbitrarily high.
so a major problem with your analysis was that you cited Random as having an elo of 477, and indexed your other answers based on that, when actually, that bot had an elo of 477 against other terrible (humorous) bots. if you put Random into FIDE tournaments, i expect its elo would be much lower.
i think you’re mis-applying the moral of this comic. the intended reading IMO is “a person believes misinformation, and perhaps they even go around spreading the misinformation to others. when they’ve been credibly corrected, instead of scrutinizing their whole ideology, they go ‘yeah but something like it is probably true enough’.” OP doesn’t point to any names or say “this is definitely happening”, they’re speculating about a scenario which may have already happened or may happen soon, and what we should do about it.
Though, notably, Metaculus lists Jan 2027 as a “community prediction” of “weakly general AI”. Sure, someone could argue that weakly general AI doesn’t imply human-level AGI soon after
it does imply that, but i’m somewhat loathe to mention this at all, because i think the predictive quality you get from one question to another varies astronomically, and this is not something the casual reader will be able to glean
The True Believers hypothesis rings false because that would be a frankly ridiculous belief to hold. Sometimes people profess ridiculous things, but very few of them put their money where their mouth is on prediction markets. [1]
I’ve seen some pretty mispriced markets. At one point in 2019, PredictIt had Andrew Yang at 16% to win the Democratic presidential primary. And in 2020, Donald Trump was about 16% to become president even after he had lost the election. But the sorts of people who bet on prediction markets are not the sorts of fundamentalist Christians who think that Jesus Christ has a high chance of returning this year.
yes, no one would put a large amount of money (say, $10,000) on let’s say a 1-year time horizon, “joe biden going to prison”, “barack obama going to prison”, “nancy pelosi, bill clinton, and hillary clinton going to prison”, “trump being put in office prior to the 2024 election”, and if someone did make such a bet, they wouldn’t be motivated by listening to a christian minister who regularly makes political / religious prophecies. surely no one would do that.
i don’t know why anyone who posts on a forum devoted to outright fringe beliefs and atypical personality traits (i say with all love and kindness, not to say that any of us are bad or incorrect for these beliefs, merely that they are objectively abnormal) is going to come out and make bold claims that there exists no such weirdo who is willing to do X for Y reasons.
the main point about the time value of money is interesting enough on it’s own, but the interesting, nerd-crack explanation is probably just not true. there are probably just crazy people who bet on the return of jesus christ.
the button problem suffers from not having salient enough analogies to real world problems, and so does your article; before i read it i already reached the conclusions “this button situation has no upsides for anyone either way” and “rewording the problem and the parameters changes people’s behaviors in ways that make them seem irractional/unprincipled”. i don’t really agree that AI racing is a threshold problem: perhaps blue-coordination/safe-ai saves everyone, but red-pushing/AI-racing kills everyone, including the red-pushers. red-pushers/AI-racers might benefit from the AI in the short term, but blue-pushers/safe-ai benefit from the AI after the pause is over.
compare, though, relating newcomb’s problem to nuclear mutually assured destruction: if you’re in charge of the US arsenal and you’re 100% sure that soviet bombs are dropping, do you launch your own nukes against them? similar arguments apply: “the money is already there/the bombs have already dropped, so my choices can’t change that” vs. “if you can credibly pre-commit to actions, that changes how people interact with you.” some people arguing newcomb’s problem probably get lost in the weeds and abstractions that gets fixed with the nuclear analogy