Logan Zoellner

Karma: 1,437

Logan Zoellner 17 Apr 2026 22:53 UTC
2 points
−8
on: You can only build safe ASI if ASI is globally banned
On your way to figuring out how to build controllable ASI, you will have figured out how to build unsafe ASI, because unsafe ASI is vastly easier to build than controlled ASI, and is on the same tech path.
This is only true if you are building some kind of cartoon ASI that self-replicates without regard for its creators’ intentions. If you (a human being) are trying to build ASI to achieve any purpose at all you basically have to solve AI safety along the way. This is empirically demonstrated. GPT 3.5 wasn’t vastly more intelligent than GPT 3, but it was vastly more useful because RLHF was used to aim it at goals. We see the exact same trend today. Far from paying an “alignment tax”, Anthropic is able to build the most powerful AI models because they are obsessed with the question “how do I control the AI?”

Logan Zoellner 5 Feb 2026 12:21 UTC
0 points
0
in reply to: Max Harms’s comment on: Bentham’s Bulldog is wrong about AI risk
>Arguably that wasn’t the point of the book
Why did you title the book “If anyone builds it everyone dies” if the point of the book was not to convince people “If anyone builds it everyone dies”? If this really was some obscure philosophical project that has no bearing on the real question why not give it some obscure title like “On the Electrodynamics of Moving Bodies” to clearly indicate “this isn’t meant to be persuasive or even comprehensible to 99% of human beings”

Logan Zoellner 3 Feb 2026 2:41 UTC
4 points
0
in reply to: Max Harms’s comment on: Bentham’s Bulldog is wrong about AI risk
“my prior is low,” not “the evidence isn’t convincing,”
I still don’t follow.
You wrote an entire book and it didn’t move Bentham’s priors. If that’s not a clear cut example of “the evidence [in the book] isn’t convincing.” I don’t know what is.
In fact, if someone wrote an entire book (in which I would assume they would naturally collect the best arguments for a position) and I found no convincing evidence it, I would actively consider that evidence against the position. Because “I haven’t done much research but the evidence looks poor” is a less definitive conclusion than “I have read the foremost expert’s book on the topic and the evidence looks bad.”

Logan Zoellner 2 Feb 2026 0:31 UTC
2 points
1
in reply to: Max Harms’s comment on: Bentham’s Bulldog is wrong about AI risk
Suppose that, in the years before telescopes, I came to you and said “the planets are other worlds, like ours, and a bunch of them have moons.”
Suppose you should believe, without evidence such a theory, as opposed to one of the many equally plausible but wrong theories that were going around at the time such as: “other planets will have different kinds of men on them” or “other planets have vegetation and life on them” or “other planets have rocky surfaces and air on them”.
And suppose that subsequently, evidence should be discovered proving that you, and you alone were correct.
Then you will be lauded throughout the world. People will declare you a thought-leader, an influencer a visionary of the future. Undoubtedly, wealth and fame will attract themselves to you. History books will sing your praises for centuries to come as “the man who knew other planets had moons.”
In one small, dark corner of the internet, however, you will encounter a strange group of people. These people have beliefs like “claims should be based off of evidence.” And those people will use a different word to describe you: lucky.

Logan Zoellner 31 Jan 2026 10:23 UTC
8 points
6
on: Bentham’s Bulldog is wrong about AI risk
To criticize an idea on the grounds that the evidence for that idea isn’t conclusive is insane — that’s a problem with your body of evidence, not the ideas themselves!
What does this sentence even mean? The problem isn’t the idea, it’s that there’s not enough evidence for it… sounds like the problem is with the idea.

Logan Zoellner 9 Sep 2025 7:43 UTC
2 points
−2
on: Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
There are no new ideas only new datasets
Currently all LLMs are terrible at computer-use. Part of this is an ergonomics problem (GPT agent is frequently blocked from viewing websites and I still don’t trust it enough to e.g. give it my street address and credit card number). But when I give graphically demanding task that is 100% doable in the browser, it still falls absolutely flat on its face.
What is needed for RL to succeed is something like: an internet-scale dataset of graphically demanding tasks with objective success criteria. Sooner or later someone is going to put together a dataset like “here are all 150k games on steam with a simple yes/no that tells us whether or not the AI beat the game.” And when that happens, I strongly suspect RL will suddenly start working.
Alternatively, companies like figure are planning to deploy 1000′s of robots in the real-world with more or less the same idea: create a huge training set of actual physical reality (as opposite to just text + multimedia).
Once a proper dataset is in place, I expect we will not see slow-gradual progress indicated by the METR chart, but rather a huge all-at-once leap (on par with when we first started properly applying RL to math).

Logan Zoellner 1 Sep 2025 13:21 UTC
2 points
−8
on: AI Induced Psychosis: A shallow investigation
Most Americans use ChatGPT if AI was causing psychosis (and the phenomena wasn’t just already psychotic people using ChatGPT) it would be showing up in statistics, not anecdotes. SA concludes that the prevalence is ~1/100k people. This would make LLMs 10x safer than cars. If your concern was saving lives, you should be focusing on accelerating AI (self driving) not worrying about AI psychosis.

Logan Zoellner 25 Jun 2025 21:42 UTC
2 points
0
on: Foom & Doom 1: “Brain in a box in a basement”
tend to say things like “probably 5 to 25 years”.
Just to be clear, your position is that 25 years from now when LLMs are trained using trillions of times as much compute and routinely doing task that take humans months to years that they will still be unable to run a business worth $1B?

Logan Zoellner 23 Jun 2025 22:12 UTC
5 points
0
in reply to: Buck’s comment on: Making deals with early schemers
thank you for clarifying.

Logan Zoellner 22 Jun 2025 19:39 UTC
LW: -2 AF: -3
−19
AF
in reply to: Buck’s comment on: Making deals with early schemers
It’s easy to imagine a situation where an AI has a payoff table like:

| defect | don’t defect
------------------------
succeed| 100 | 10
--- ------------------------------
fail | X | n/a

where we want to make X as low as possible (and commit to doing so)

For example a paperclip maximizing AI might be able to make 10 paperclips while cooperating with humans, 100 by successfully defecting against humans

Logan Zoellner 22 Jun 2025 9:02 UTC
LW: 9 AF: 3
−10
AF
on: Making deals with early schemers
seems to violate not only the “don’t negotiate with terrorists” rule, but even worse the “especially don’t signal in advance that you intend to negotiate with terrorists” rule.

Logan Zoellner 5 Jun 2025 12:15 UTC
2 points
0
in reply to: Garrett Baker’s comment on: Why I am not a successionist
Those all sound line fairly normal beliefs.

Like… I’m trying to figure out why the title of the post is “I am not a successionist” and not “like many other utilitarians I have a preference for people who are biologically similar to me, I have things in common with, or I am close friends with. I believe when optimizing utility in the far future we should take these things into account”

Even though can’t comment on OP’s views, you seemed to have a strong objection to my “we’re merely talking price” statement (i.e. when calculating total utility we consider tradeoffs between different things we care about).

Edit:

to put it another way, if I wrote a post titled “I am a successionist” in which I said something like: “I want my children to have happy lives and their children to have happy lives, and I believe they can define ‘children’ in whatever way seems best to them”, how would my views actually different from yours (or the OPs)?

Logan Zoellner 5 Jun 2025 8:38 UTC
2 points
0
in reply to: Garrett Baker’s comment on: Why I am not a successionist
I genuinely want to know what you mean by “kind”.

If your grandchildren adopt an extremely genetically distant human, is that okay? A highly intelligent, social and biologically compatible alien?

You’ve said you’re fine with simulations here, so it’s really unclear.

I used “markov blanket” to describe what I thought you might be talking about: a continuous voluntary process characterized by you and your decedents making free choices about their future. But it seems like you’re saying “markov blanket bad”, and moreover that you thought the distinction should have be obvious to me.

Even if there isn’t a bright-line definition, there must be some cluster of traits/attributes you are associating with the word “kind”.

Logan Zoellner 4 Jun 2025 1:32 UTC
−8 points
−1
on: Why I am not a successionist
from a preference toward my own kind.
What is it about your kind that you care about? Is it DNA? Shared culture? Merely there being a continuous Markov blanket connecting you and them? If you’re okay with your grandchildren replacing you, you are in a certain sense a successionist. We’re merely talking price.

I suppose you could be opposed to some scheme like: we will completely annihilate the universe and create a new one with no logical connection to our own, but I don’t think anybody is planning that. Moreso that AI will be “children of the mind” vs biological children.

Logan Zoellner 11 May 2025 22:41 UTC
−1 points
−16
in reply to: Eric Neyman’s comment on: Consider not donating under $100 to political candidates
alas, this isn’t really enforceable in the USA given the 1st amendment.

Logan Zoellner 30 Apr 2025 11:28 UTC
2 points
0
in reply to: avturchin’s comment on: Most arguments for AI Doom are either bad or weak
but we eventually die.

Dying is a symmetric problem, it’s not like we can’t die without AGI. If you want to calculate p(human extinction | AGI) you have to consider ways AGI can both increase and decrease p(extinction). And the best methods currently available to humans to aggregate low probability statistics are expert surveys, groups of super-forecasters, or prediction markets, all of which agree on pDoom <20%.

Logan Zoellner 28 Apr 2025 11:40 UTC
2 points
0
in reply to: Odd anon’s comment on: Most arguments for AI Doom are either bad or weak
this experiment has been done before.

If you have a framing of the AI Doom argument that can cause a consensus of super-forecasters (or AI risk skeptics, or literally any group that has an average pDoom<20%) to change their consensus, I would be exceptionally interested in seeing that demonstrated.

Such an argument would be neither bad nor weak, which is precisely the type of argument I have been hoping to find by writing this post.

> Please notice that your position is extremely non-intuitive to basically everyone.
Please notice that Manifold both thinks AGI soon and pDoom low.

Logan Zoellner 28 Apr 2025 11:24 UTC
2 points
0
in reply to: avturchin’s comment on: Most arguments for AI Doom are either bad or weak
I think this cumulative argument works:

1. there are dozens of ways AI can prevent a mass extinction event at different stages at its existence.
2. …
If you make a list of 1000 bad things and I make a list of 1000 good things, I have no reason to think that you are somehow better at making lists than prediction markets or expert surveys.

Logan Zoellner 23 Apr 2025 21:28 UTC
−3 points
−7
on: Why Should I Assume CCP AGI is Worse Than USG AGI?
Are you genuinely unfamiliar with what is happening to the uyghurs, or is this a rhetorical question?

Logan Zoellner 5 Apr 2025 13:51 UTC
2 points
0
in reply to: Daniel Kokotajlo’s comment on: METR: Measuring AI Ability to Complete Long Tasks
Why do I expect the trend to be superexponential? Well, it seems like it sorta has to go superexponential eventually. Imagine: We’ve got to AIs that can with ~100% reliability do tasks that take professional humans 10 years. But somehow they can’t do tasks that take professional humans 160 years?
I don’t think this means the real thing has to go hyper-exponential, just that “how long does it take humans to do a thing?” is a good metric when AI is sub-human but a poor one when AI is superhuman.

If we had a metric “how many seconds/turn does a grandmaster have to think to beat the current best chess-playing AI”, it would go up at a nice steady rate until shortly after DeepBlue at which point it shoots to infinity. But if we had a true measurement of chess quality, we wouldn’t see any significant spike at the human-level.