Ben Pace

Karma: 34,617

I’m an admin of LessWrong. Here are a few things about me.

I generally feel more hopeful about a situation when I understand it better.
I have signed no contracts nor made any agreements whose existence I cannot mention.
I believe it is good take responsibility for accurately and honestly informing people of what you believe in all conversations; and also good to cultivate an active recklessness for the social consequences of doing so.
It is wrong to directly cause the end of the world. Even if you are fatalistic about what is going to happen.

(Longer bio.)

Ben Pace May 30, 2025, 11:26 PM
7 points
0
in reply to: Czynski’s comment on: Czynski’s Shortform
I heard someone (who I respect) say why they don’t post on LessWrong more. They said when they talk about their thoughts and ideas with their friends, the friends won’t question their basic frame / sanity, and so won’t undermine their trust in themself. They said that this is acceptable on LessWrong, which is a more uncomfortable experience.
Ever since then I’ve tried a bit harder to make sure to question my friends’ basic frames and sanity, so that I’m not encouraging self-blinding in the people around me, and so they will know that they’re welcome to do the same to me.

Ben Pace May 30, 2025, 5:06 AM
6 points
2
on: CFAR is running an experimental mini-workshop (June 2-6, Berkeley CA)!
Yay! I’m happy to see this happening. I would strongly consider going if I wasn’t running LessOnline immediately before it (after which I will both be exhausted AND on call for festival-season work).

Ben Pace May 29, 2025, 2:53 PM
29 points
17
in reply to: Joseph Miller’s comment on: Zac Hatfield Dodds’s Shortform
I’d guess it looks more stable to investors. Unlike if you have a bunch of EAs on the OpenAI board and they confusingly try to fire the CEO for as unimportant a crime as lying; that’s quite hard for investors to predict.

Lighthaven Sequences Reading Group #36 (Tuesday 5/27)

Jozdien, Garrett Baker, Ben Pace, Ronny Fernandez and Aella

May 26, 2025, 11:52 PM

8 points

0 comments1 min readLW link

Ben Pace May 26, 2025, 3:48 AM
LW: 15 AF: 6
10
AF
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
I think the main thing I want to convey is that I think you’re saying that LWers (of which I am one) have a very low opinion of the integrity of people at Anthropic, but what I’m actually saying that their integrity is no match for the forces that they are being tested with.
I don’t need to be able to predict a lot of fine details about individuals’ decision-making in order to be able to have good estimates of these two quantities, and comparing them is the second-most question relating to whether it’s good to work on capabilities at Anthropic. (The first one is a basic ethical question about working on a potentially extinction-causing technology that is not much related to the details of which capabilities company you’re working on.)

Ben Pace May 25, 2025, 8:05 PM
LW: 2 AF: 2
0
AF
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
What’s an example decision or two where you would want to ask yourself whether they should get more or less open-ended power? I’m not sure what you’re thinking of.

Ben Pace May 25, 2025, 7:06 PM
LW: 11 AF: 5
9
AF
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
Not the main thrust of the thread, but for what it’s worth, I find it somewhat anti-helpful to flatten things into a single variable of “how much you trust Anthropic leadership to make decisions which are good from your perspective”, and then ask how optimistic/pessimistic you are about this variable.
I think I am much more optimistic about Anthropic leadership on many axis relative to an overall survey of the US population or Western population – I expect them to be more libertarian, more in favor of free speech, more pro economic growth, more literate, more self-aware, higher IQ, and a bunch of things.
I am more pessimistic about their ability to withstand the pressures of a trillion dollar industry to shape their incentives than the people who are at Anthropic.
I believe the people working there are siloing themselves intellectually into an institution facing incredible financial incentives for certain bottom lines like “rapid AI progress is inevitable” and “it’s reasonably likely we can solve alignment” and “beating China in the race is a top priority”, and aren’t allowed to talk to outsiders about most details of their work, and this is a key reason that I expect them to screw up their decision-making.
I am optimistic about their relative-ability to have a sensible conversation about the next 5 years and what alignment failures look like, relative to most people on earth. This is not the standard I require to expect people to not do ML training runs that lead to human extinction, but nonetheless I predict they will do relatively quite well on this axis.
I don’t have a single variable here, I have a much more complicated model than this. It looks to me that collapsing questions of trust about people or groups into a single varibale of how optimistic I am about them making decisions which are good from my values has been a common question-substitution in the Effective Altruism scene, where I think people have been repeatedly hoodwinked by sociopaths due to not moving toward a more detailed model that predicts exactly where and when someone will make good vs bad decisions.

Ben Pace May 21, 2025, 4:14 PM
4 points
0
in reply to: leogao’s comment on: leogao’s Shortform
(I affirm that I don’t believe you were being knowingly dishonest or deceptive at any point in this thread.)

Ben Pace May 20, 2025, 6:39 AM
4 points
0
in reply to: Veedrac’s comment on: leogao’s Shortform
But not in full generality! This is a fine question to raise in this context, but in general the correct thing to do in basically all situations is to consider the object level, and then also let yourself notice if people are unusually insane around a subject, or insane for a particular reason. Sometimes that is the decisive factor, but for all questions, the best first pass is to think about how that part of the world works, rather than to think about the other monkeys who have talked about it in the past.

Ben Pace May 20, 2025, 6:38 AM
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
ou’re I am confused why you think my claims are only semi related. to me my claim is very straightforward, and the things i’m saying are straightforwardly converying a world model that seems to me to explain why i believe my claim. i’m trying to explain in good faith, not trying to say random things. i’m claiming a theory of how people parse information, to justify my opening statement,
Thank you for all this. I still think your quick take is wrong on the matter of epistemology.
I acknowledge that you make a fine point about persuasion, that someone who is primarily running the heuristic that “claims about the end of the world are probably crack-pots or scammers” will not be persuaded by someone arguing that actually 20:1 against and 20:1 in favor of a claim are equally extreme beliefs.
A version of the quick take that I would’ve felt was just fine would read:
Some people have basically only heard claims of human extinction coming from crackpots and scammers, and will not have thought much about the AI extinction idea on the object level. To them, this sort of argument I’ve discussed is unpersuasive at moving beyond the “is this a crackpot/scam” part of the dialogue. In this quick take I’ll outline my model of how they’re thinking about it, and give recommendations for how you should argue instead.
But your quick take doesn’t confine itself to discussing those people in those situations. It flatly says it’s true as a matter of epistemology that you should “use bigness of claim as a heuristic for how much evidence you need before you’re satisfied”, that you should “use reference classes that have consistently made good decisions irl” and that the crackpots/scammers one is the correct one to use here otherwise you’ll risk “getting pwned ideologically”.
These aren’t always the right heuristics (e.g. on this issue they are not for you and for me) and you shouldn’t say that they are just so that some people on Twitter will stop using rhetoric that isn’t working.
I believe you’re trying to do your best to empathize with people who are unpersuaded by an unsuccessful rhetorical move, a move that people who believe your position are making in public discourse. That is commendable. I think you are attempting to cause other people who hold your position to stop using that rhetorical move, by telling them off for using it, but to acheive this aim you are repeatedly saying the people who do not hold your position are doing normatively correct epistemology, and you’re justifying it with Occam’s razor and reference class forecasting, and this is all wrong. In some situations for some people it is reasonable to primarily use theses heuristics, and in other situations for other people it is not. I’m not arguing that the people unpersuaded are being unreasonable, but (for example) your opening sentence makes fully-general statements about how to reason about this issue that I believe are false. Rule number of one of good discourse: don’t make false statements about epistemology in order to win an object level point.
Yep, seems fine to drop this here; I make no bid of you to reply further.

Lighthaven Sequences Reading Group #35 (Tuesday 5/20)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace and Jozdien

May 19, 2025, 8:58 PM

8 points

0 comments1 min readLW link

Ben Pace May 19, 2025, 1:35 AM
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
You’re writing lots of things here but as far as I can tell you aren’t defending your opening statement, which I believe is mistaken.
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
Firstly, it’s just not more reasonable. When you ask yourself “Is a machine learning run going to lead to human extinction?” you should not first say “How trustworthy are people who have historically claimed the world is ending?”, you should of course primarily bring your attention to questions about what sorts of machine is being built, what sort of thinking capacities it has, what sorts of actions it can take in the world, what sorts of optimization it runs, how it would behave around humans if it were more powerful than them, and so on. We can go back to discussing epistemology 101 if need be (e.g. “Hug the Query!”).
Secondly, insofar as someone believes you are a huckster or a crackpot, you should leave the conversation, communication here has broken down and you should look for other communication opportunities. However, insofar as someone is only evaluating this tentatively as one of many possible hypotheses about you then you should open yourself up to auditing / questioning by them about why you believe what you believe and your past history and your memetic influences. Being frank is the only way through this! But you shouldn’t say to them “Actually, I think you should treat me like a huckster/scammer/serf-of-a-corrupt-empire.” This feels analogous to a man on a date with a woman saying “Actually I think you should strongly privilege the hypothesis that I am willing to rape you, and now I’ll try to provide evidence for you that this is not true.” It would be genuinely a bad sign about a man that he thinks that about himself, and also he has moved the situation into a much more adversarial frame.
I suspect you could write some more narrow quick-take such as “Here is some communication advice I find helpful when talking with friends and colleagues about how AI can lead to human extinction”, but in generalizing it all the way to making dictates about basic epistemology you are making basic mistakes and getting it wrong.
Please either (1) defend and/or clarify the original statement, or (2) concede that it was mistaken, rather than writing more semi-related paragraphs about memetic immune systems.

Ben Pace May 19, 2025, 1:09 AM
2 points
0
in reply to: Veedrac’s comment on: leogao’s Shortform
Thanks for the comment. (Upvoted.)
a. I expect there is a slightly more complicated relationship between my value-function and the likely configuration states of the universe than literally zero-correlation, but most configuration states do not support life and we are all dead, so in one sense a claim that in the future something very big and bad will happen is far more likely on priors. One might counter that we live in a highly optimized society where things being functional and maintained is an equilibrium state and it’s unlikely for systems to get out of whack enough for bad things to happen. But taking this straightforwardly is extremely naive, tons of bad things happen all the time to people. I’m not sure whether to focus on ‘big’ or ‘bad’ but either way, the human sense of these is not what the physical universe is made out of or cares about, and so this looks like an unproductive heuristic to me.
b. On the other hand, I suspect the bigger claims are more worth investing time to find out if they’re true! All of this seems too coarse-grained to produce a strong baseline belief about big claims or small claims.
c. I don’t get this one. I’m pretty sure I said that if you believe that you’re in a highly adversarial epistemic environment, then you should become more distrusting of evidence about memetically fit claims.
I don’t know what true points you think Leo is making about “the reference class”, nor which points you think I’m inaccurately pushing back on that are true about “the reference class” but not true of me. Going with the standard rationalist advice, I encourage everyone to taboo “reference class” and replace it with a specific heuristic. It seems to me that “reference class” is pretending that these groupings are more well-defined than they are.

Ben Pace May 18, 2025, 10:28 PM
7 points
3
in reply to: leogao’s comment on: leogao’s Shortform
Your points about Occam’s razor have got nothing to do with this subject^[1]. The heuristic “be more skeptical of claims that would have big implications if true” makes sense only when you suspect a claim may have been adversarially optimized for memetic fitness; it is not otherwise true that “a claim that something really bad is going to happen is fundamentally less likely to be true than other claims”.
I’m having a little trouble connecting your various points back to your opening paragraph, which is the primary thing that I am trying to push back on.^[2]
I claim it is a lot more reasonable to use the reference class of “people claiming the end of the world” than “more powerful intelligences emerging and competing with less intelligent beings” when thinking about AI x-risk. further, we should not try to convince people to adopt the latter reference class—this sets off alarm bells, and rightly so (as I will argue in short order) - but rather to bite the bullet, start from the former reference class, and provide arguments and evidence for why this case is different from all the other cases.
To restate the message I’m reading here: “Give up on having a conversation where you evaluate the evidence alongside your interlocutors. Instead frame yourself as trying to convince them of something, and assume that they are correct to treat your communications as though you are adversarially optimizing for them believing whatever you want them to believe.” This assumption seems to give up a lot of my ability to communicate with people (almost ~all of it), and I refuse to simply do it because some amount of communication in the world is adversarially optimized, and I’m definitely not going to do it because of a spurious argument that Occam’s razor implies that “claims about things being really bad or claims that imply you need to take action are fundamentally less likely to be true”.
You are often in an environment where people are trying to use language to describe reality, and in that situation the primary thing to evaluate is not the “bigness” of a claim, but the evidence for and against it. I recommend instead to act in such a way as to increase the size and occurrence of that environment more-so than “act as though it’s correct to expect maximum adversarial optimization in communications”.
(Meta: The only literal quotes of Leo’s in this comment are the big one in the quote block, my use of “” is to hold a sentence as object, they are not things Leo wrote.)
1. ^
  I agree that the more strongly a claim implies that you should take action, then the more you should consider that it is being optimized adversarially for you to take action. For what it’s worth, I think that heuristic applies more so to claims that you should personally take action. Most people have little action to directly prevent the end of the world from AI; this is a heuristic more naturally applied to claims that you need to pay fines (which are often scams/spam). But mostly, when people give me claims that imply action, they are honestly meant claims and I do the action. This is the vast majority of my experience.
2. ^
  Aside to Leo: Rather than reply point-by-point to the each of the paragraphs in the second comment, I will try restating and responding to the core message I got in the opening paragraph of the first comment. I’m doing this because the paragraphs in the second-comment seemed somewhat distantly related / I couldn’t tell whether the points were actually cruxy. They were responding to many different things, and I hope restating the core thing will better respond to your core point. However I don’t mean to avoid key arguments, if you think I have done so feel free to tell me one or two paragraphs you would especially like me to engage with and I will do so in any future reply.

Ben Pace May 18, 2025, 1:47 AM
37 points
1
in reply to: leogao’s comment on: leogao’s Shortform
This all seems wrongheaded to me.
I endeavor to look at how things work and describe them accurately. Similarly to how I try to describe how a piece of code works, or how to to build a shed, I will try to accurately describe the consequences of large machine learning runs, which can include human extinction.
I personally think AGI will probably kill everyone. but this is a big claim and should be treated as such.
This isn’t how I think about things. Reality is what exists, and if a claim accurately describes reality, then I should not want to hold it to higher standards than claims that do not describe reality. I don’t think it’s a good epistemology to rank claims by “bigness” and then say that the big ones are less likely and need more evidence. On the contrary, I think it’s worth investing more in finding out if they’re right, and generally worth bringing them up to consideration with less evidence than for “small” claims.
on the other hand, everyone has personally experienced a dozen different doomsday predictions. whether that’s your local church or faraway cult warning about Armageddon, or Y2K, or global financial collapse in 2008, or the maximally alarmist climate people, or nuclear winter, or peak oil. for basically all of them, the right action empirically in retrospect was to not think too much about it.
I don’t have the experiences you’re describing. I don’t go to churches, I don’t visit cults, I was 3yrs old in the year 2000, I was 11 for the ’08 financial crash and having read about it as an adult I don’t recall extinction being a topic of discussion, I think I have heard of climate people saying that via alarmist news headlines but I have not had anyone personally try to convince me of this or even say that they believe it. I have heard it discussed for nuclear winter, yes, and I think nukes are quite scary and it was reasonable to consider, I did not dismiss it out of hand and wouldn’t use that heuristic. I don’t know what the oil thing is.
In other words, I don’t recall anyone seriously trying to convince me that the world was ending except in cases where they had good reason to believe it. In my life, when people try to warn me about big things, especially if they’ve given it serious thought, usually I’ve found it’s been worthwhile for me to consider it. (I like to think I am good at steering clear of scammers and cranks, so that I can trust the people in my life when they tell me things.)
The sense I get from this post is that, in it, you’re assuming everyone else in the world is constantly being assaulted with claims meant to scare and control them rather than people attempting to describe the world accurately. I agree there are forces doing that, but I think this post gives up all too quickly on there being other forces in the world that aren’t doing that that people can recognize and trust.

Ben Pace May 15, 2025, 5:10 PM
2 points
0
on: From Comments on Accountability Sinks
Oops, I didn’t send my reply comment. I’ve just posted it, yes, that information did change my mind about this case.

Ben Pace May 15, 2025, 5:09 PM
4 points
0
in reply to: Martin Sustrik’s comment on: Accountability Sinks
Thank you for the details! I change my mind about the locus of responsibility, and don’t think Wascher seems as directly culpable as before. I don’t update my heuristic, I still think there should be legal consequences for decisions that cause human deaths,
My new guess is that something more like “the airport” should be held accountable and fined some substantial amount of money for the deaths, to go to the victim’s families.

Having looked into it a little more I see they were sued substantially for these, so it sounds like that broadly happened.

Lighthaven Sequences Reading Group #34 (Tuesday 5/13)

Garrett Baker, Aella, Ronny Fernandez, Ben Pace and Jozdien

May 10, 2025, 7:42 AM

8 points

0 comments1 min readLW link

Ben Pace May 5, 2025, 7:26 AM
6 points
0
on: Accountability Sinks
I liked reading these examples; I wanted to say, it initially seemed to me a mistake not to punish Wascher, whose mistake led to the death of 35 people.
I have a weak heuristic that, when you want enforce rules, costs and benefits aren’t fungible. You do want to reward Wascher’s honesty, but I still think that if you accidentally cause 35 people to die this is evidence that you are bad at your job, and separately it is very important to disincentivize that behavior for others who might be more likely to make that mistake recklessly. There must be a reliable punishment for that kind of terrible mistake.
So you must fire her and bar her from this profession, or fine her half a year’s wages, or something. If you also wish to help her, you should invest in supporting her get into a new line of work with which she can support her family, or something. You can even make her net better off for having helped uncover a critical mistake and saving future lives. But people should know that there was a cost and there will be if they do so in future.
Or at least this is what my weak heuristic says.
What links here?
- From Comments on Accountability Sinks by Martin Sustrik (May 15, 2025, 10:20 AM; 15 points)

Ben Pace May 1, 2025, 9:42 PM
6 points
9
in reply to: Casey_’s comment on: “The Urgency of Interpretability” (Dario Amodei)
I don’t think that propaganda must necessarily involve lying. By “propaganda,” I mean aggressively spreading information or communication because it is politically convenient / useful for you, regardless of its truth (though propaganda is sometimes untrue, of course).
When a government puts up posters saying “Your country needs YOU” this is intended to evoke a sense of duty and a sense of glory to be had; sometimes this sense of duty is appropriate, but sometimes your country wants you to participate in terrible wars for bad reasons. The government is saying it loudly because for them it’s convenient for you to think that way, and that’s not particularly correlated with the war being righteous or with the people who decided to make such posters even having thought much about that question. They’re saying it to win a war, not to inform their populace, and that’s why it’s propaganda.
Returning to the Amodei blogpost: I’ll happily concede that you don’t always need to give reasons for your beliefs when expressing them—context matters. But in every context—tweets, podcasts, ads, or official blogposts—there’s a difference between sharing something to inform and sharing it to push a party line.
I claim that many people have asked why Anthropic believes it’s ethical for them to speed up AI progress (by contributing to the competitive race), and Anthropic have rarely-if-ever given a justification of it. Senior staff keep indicating that not building AGI is not on the table, yet they rarely-if-ever show up to engage with criticism or to give justifications for this in public discourse. This is a key reason why it reads to me as propaganda, because it’s an incredibly convenient belief for them and they state it as though any other position is untenable, without argument and without acknowledging or engaging with the position that it is ethically wrong to speed up the development of a technology they believe has a 10-20% chance of causing human extinction (or a similarly bad outcome).
I wish that they would just come out, lay out the considerations for and against building a frontier lab that is competing to reach the finish line first, acknowledge other perspectives and counterarguments, and explain why they made the decision they have made. This would do wonders for the ability to trust them.
(Relatedly, I don’t believe the Machines of Loving Grace essay is defending the position that speeding up AI is good; the piece in fact explicitly says it will not assess the risks of AI. Here are my comments at the time on that essay also being propaganda.)

Ben Pace

Lighthaven Se­quences Read­ing Group #36 (Tues­day 5/​27)

Lighthaven Se­quences Read­ing Group #35 (Tues­day 5/​20)

Lighthaven Se­quences Read­ing Group #34 (Tues­day 5/​13)

Lighthaven Sequences Reading Group #36 (Tuesday 5/27)

Lighthaven Sequences Reading Group #35 (Tuesday 5/20)

Lighthaven Sequences Reading Group #34 (Tuesday 5/13)