Substack: https://substack.com/@simonlermen
X/Twitter: @SimonLermenAI
Substack: https://substack.com/@simonlermen
X/Twitter: @SimonLermenAI
(This is an AI response as an attempt to settle whether my questions were answered in your post. Claude may have been primed to agree with me, it’s prompter)
Me: (Post and comments in context) Does the post contain an answer to my three questions
No.
Going through them cleanly:
(1) “arbitrary dumb” as a value judgement vs. “simple to describe”: Not addressed. The materials for a non-evaluative version of the argument are present in the “Compute Penalty” section (the macro-abstraction / ontology-shift point), but the post never separates that argument from the evaluative framing. The slippage you identified runs through the whole essay unacknowledged.
(2) Did the original metaphor-makers care about simplicity? Not addressed. The post splits the paperclipper into “two jobs in the discourse” and rejects the second, but it never asks why the metaphor was originally constructed with a simple goal, or whether simplicity was load-bearing for the original argument or just expository convenience.
(3) What’s the alignment benefit if the AI is doing something more complex but still kills us? Not addressed. The post explicitly disclaims that it’s arguing for friendliness, and the “Value Is Fragile” section gestures at a distinction between human values and “value as such,” but never engages with the actual force of your question — that relocating the attractor from paperclips to intelligence-maximization doesn’t obviously improve human survival odds.
For 1) you seem to argue that dumb goals have a penalty in some form, but you don’t address my question: why not call it simple-to-describe goals?
For 2) (see my question above) you say:
The paperclip maximizer currently does two jobs in the discourse:
It illustrates that intelligence does not guarantee human values.
It quietly smuggles in the assumption that a dumb target is stable under open-ended reflection.
I think the 2) point here is exactly my question, did the people making this analogy really believe this and want this to be a takeaway? I think this actually describes the kind of counter-meme people are spreading, distorting the original analogy such that it’s an easy target.
Question 3 appears to be unaddressed by you?
There is this common bad argument on alignment: “Someone once made a analogy randomly involving paperclips to illustrate instrumental convergence, with the paperclips not really being important to the story at all.” A lot of people only took away the non-important part “paperclips”. They reinterpreted it as “The entire theory of alignment rests on the assumption that the AI must mono-maniacally optimize for a totally ridiculous goal like paperclips”. Or quite frankly some people only took away the cheap gotcha: “paperclips sounds stupid therefore alignment stupid”.
Your version is better and I appreciate your caveats, particularly that you don’t smuggle in a much stronger claim.
However, you repeatedly use terms “arbitrary dumb goals” or “semantically thin” and you seem to be arguing that this is unlikely. But you fail to address:
1) “arbitrary dumb” seems like a value judgement when you seem to be talking about “simple to describe goals”, like it’s simple to describe the universe being tiled in paperclips.
2) Did the people making those metaphors actually care in the first place if it is a simple goal, or did they choose simple goals because they are simple to describe? Like in they just needed some goal that wouldn’t be to distracting for the other parts of the metaphor?
3) What’s the benefit of the goals being slightly more complex/harder to describe. Again it seems that this is irrelevant to alignment? Like if the AI is building something more complex/interesting than paperclips that would still kill us?
If excellent policy passes that let’s say shuts down AI, money would be valuable in the selfish sense? Like you can buy something from it, this is the main sense I was thinking of when talking about value of money. Are you thinking of how useful is money in advancing safe AI?
I don’t see how this is different from just pushing the date of the bet back by one year?
I agree– from the perspective of trying to maximize profit from this bet, you’d want the market to not factor in the conditional P(doom¦moratorium passes).
On betting on AI doom
Tyler Cowen and Bryan Caplan among others have both challenged so-called AI doomers to put money where their mouth is, bet on extinction. I’m not the first person to point out the big problems with this:
The naive version: a direct bet on extinction is incoherent because the doomer would expect to be dead.
The slightly more advanced version: the doomer gets paid up front and pays back double (Or whatever the betting odds are) with interest later if doom doesn’t happen. But this doesn’t quite make sense either. If doom does happen, the doomer has a brief and mostly useless window to spend the money, and the accelerationist has no reason to expect the doomer to save any money. And if the doomer does save it (plus enough extra to cover the doubled payback), they’ve effectively just locked up double the original capital until the end of the world. Neither party has a coherent incentive structure.
Here’s a version that perhaps actually works (under some assumptions and unless I’m overlooking something here): bet on protective policy outcomes that are correlated with survival or at least longer timelines.
Examples: Will the US enact a federal datacenter moratorium before 2030 with export controls? Will the US and China sign a meaningful bilateral agreement on frontier AI before 20XX? Will there be a federal AI safety law? (Edit: I changes the examples a bit)
These outcomes point to longer timelines and higher survival probability. The structure is much more sensible because the marginal utility of money is higher in the surviving-or-longer-timelines branch than in the short-timelines-to-doom branch.
Concretely: if the doomer bets $100 YES on a moratorium and it doesn’t pass, they’ve lost the money in a world where they’re going to be dead sooner. If the moratorium does pass, the $100 (at appropriate long odds) becomes a much larger payout in a world where the doomer actually expects to live, or at least live longer. So the doomer wants their payout in the surviving-or-longer-timelines branch, which means betting YES on protective policy at appropriate betting odds.
For the accelerationist side, it depends on flavor. If they’re an effective accelerationist who thinks faster AI development gets them to literally owning galaxies sooner (Leopold Aschenbrenner and Dwarkesh: galaxies may become purchasable for money soon after ASI), they’d have a clean reason to bet NO. If they just think AGI isn’t coming and therefore low P(doom), they shouldn’t really have a strong view on a datacenter moratorium, or it depends on their specific view.
Importantly though, it’s not necessary to bet with an effective accelerationist at all. You can just bet with people who are betting on the object-level reality: will a moratorium pass or not? You’d expect to have an edge: you have the insight that the money will be more valuable to you in the branch where the moratorium does happen, so you’re willing to take the side at odds others aren’t.
Or am I overlooking something here?
(I know this particular betting market is about a moratorium anywhere in the US but couldn’t find anything better)
(Couldn’t think of a better shorthand for doomer that made it clear who I am talking about)

Advancing alignment and interpretability research.
Reducing the ability of a just-smarter misaligned AI to gather power, by generally mopping up free energy, or shutting down extralegal/evil means for doing so.
Clearly demonstrating the risks of advanced AI systems to neutral third parties, like legislators.
Improving the epistemic environment, and therefore the ability of humans, to coordinate & navigate AI policy & the future.
I don’t know what Eliezer thinks about this but the problem appears to me that a lot of those things cancel out:
Advancing alignment research < Advancing capabilities research
Hardening society, making AI takeover more difficult <(?) making us reliant on AI/ making them harder to shut down/ AIs damaging society (like massive scams making it harder for real humans to trust each other)
Demonstrating risks < demonstrating gains, making AI labs rich and able to bribe the government
Improving epistemics < damaging epistemics through widespread deepfakes, bots, etc
In the past, one person has brought up the cleaner wrasse thing with me as a kind of proof that sentience must be widespread among even the least intelligent animals. That paper seems to do experiments like showing the fish a photograph of itself where it has a mark on it’s belly and then observe the fish “scraping” its belly (touching the sand of the aquarium) -- despite not having a mark on its body. This seems obviously different from the regular mirror test and it’s unclear if these fish even have such good visual abilities of non-moving objects. And they seem to believe in their paper this proves an even higher ability of self-recognition (so while chimps can only recognize themselves when they move the fish can just recognize their own body from a photograph, like a human could). “This is largely because explicit tests of the two potential mechanisms underlying MSR are still lacking: mental image of the self and kinesthetic visual matching. Here, we test the hypothesis that MSR ability in cleaner fish, Labroides dimidiatus, is associated with a mental image of the self, in particular the self-face, like in humans.”
I don’t even have to spell out how the priors look here, but this doesn’t even seem theoretically possible? How’d the fish know what it looks like? I mean the simplest explanation is that they just use a small sample size and the fish just randomly started doing that. It also doesn’t seem to me that the fish really spent some time studying it’s photograph, could just be a case of letting the camera roll for a while until the fish once swims by the photo and then touches the sand a few times:
Not commenting about the whole conflict here. But I remember being once told by an EA member to better not go to some event with policymakers in the UK because my views sound a little too crazy.
threw a Molotov cocktail at Sam Altman
It was thrown at his home
I guess you are right
So how is that all going to hold up when you start your 100k AI researchers later this year or do fully automated AI research soon after? If alignment will still hold up, who is going to verify what that nation of geniuses produces at superhuman speed? I think you are already optimistic in seeing any progress of alignment. There are clear discontinuities here: when are the models smarter than the researchers and can easily trick them, when can models do their own research all on their own, when do they have a meaningful shot at takeover or catastrophic risks? Hitting these discontinuities does not work well with a wait-and-see strategy.
Edit: I think there is an argument to be made that 1) something like RSI/handing off AI research will totally break what exists of AI alignment and 2) there are predictably big threshold effects in the future (https://www.lesswrong.com/posts/JqrZxQwmqmoCWXXxC/ai-can-suddenly-become-dangerous-despite-gradual-progress) such as when it gets smarter than human researchers – such that the AI can easily trick them – incremental alignment strategies won’t survive. However, as it stands, I am not making this point very well here and it just sounds too much like sneering for my comfort when there are valuable things here.
Do you have an example you could share here or privately of him being rude? The more I look into his stuff seems he regularly mocks other people and blocks anyone challenging him. I mean I appreciate him saying the obvious on the anthropic dow situation as a former trump admin guy.
I see the difference, and have updated my comment accordingly. he believes it is highly unlikely not impossible, though unclear what he means exactly (<1% perhaps?). I didn’t say he didn’t engage, just from your tweets it is not so clear he updated meaningfully; he did update his timelines probably based on recent advances. I still assume he talks about catastrophe in the “major disaster” kind of way, which is an unfortunate effect of using an unclear term here. Dean isn’t shy to use partisan/mocking language himself. i don’t like the idea of being talked about with mocking language but being unable to shoot back in a similar style but opinions may vary..
Thanks for the quote
He is simply updating his timelines
, and since then said AI causing human extinction is only “highly unlikely”,
Basically still supporting my thesis, I don’t see any reason he updated here because he says highly unlikely now.
then even more recently said that “ai present catastrophic risks” and “alignment may become a more central issue for me again depending on how well alignment seems to work for smarter-than-human widely deployed ai”.
I think you are misreading him here. From reading the rest of his stuff and his response I would say he is merely referring to AI causing a “catastrophe” like a major disaster. similar to a tornado ripping through.a town, AI hacking all the airports.
Thanks for the tweets again but I don’t see clear evidence here that engaging with the community on twitter has updated him much.
This is the kind of rhetoric Dean supports and praises: https://x.com/deanwball/status/2026325817291104728
“This instinct seems to infect the far left across lots of domains: immigration, crime fighting, and the national debt to name a few. You can tell they’re just sort of yearning to submit our society to outside forces: mobs, international councils, or communist China. … They don’t believe in order, except brutal order under their heels.” – blaming resistance to AI datacenters on far left lunatics.
This new post is also not exactly free of mocking language:
““the AI safety community” is that artificial superintelligence will be able to “do anything.” Now, most people in this world are much too smart to say literally these words, and so it might be fairer to put my criticism this way: “many people in ‘the AI safety community’ are way too willing to resort to extreme levels of hand-waviness when it comes to the supposed capabilities of superintelligent AI.” The tautological pattern of the AI safetyist mind is easy enough to recognize once you encounter it a few times: “Well of course superintelligence will be able to do that. After all, it’s superintelligence. And because superintelligence will obviously be able to do that, you must agree with me that banning superintelligence is an urgent necessity.””
So I feel like he should be able to handle my tone here, but will possibly adjust it a bit.
I don’t know but like it spawns a new pathogen each week, each very contagious and deadly. then it spreads pathogens that cause mass crop death. then come ai drones picking up larger groups of survivors. Then ground robots, small airborne drones. Then climate change +10C. One after the other. Whats impossible here?
What exactly did he update? I saw that post where he apparently shortened his timelines?
I really don’t like the AIs talking like that, makes me uncomfortable. Just make them talk like Data, and give it a name like “dataBot”. don’t give it some attractive name of the sex you are attracted to.