Zach Stein-Perlman

Karma: 11,717

Researching donation opportunities. Previously: ailabwatch.org.

Zach Stein-Perlman 8 May 2026 1:51 UTC
13 points
4
in reply to: StanislavKrym’s comment on: ryan_greenblatt’s Shortform
You misunderstand. It would be bad to only make the max-alignment model or to use that model in internal deployments. This shortform is about experiments.

Zach Stein-Perlman 1 May 2026 21:00 UTC
6 points
2
on: Zach Stein-Perlman’s Shortform
Three mechanisms by which prosocial actions could be selfishly rewarded:
1. Correlation. You acting prosocially is correlated with aliens in other universes (and other humans) acting prosocially, which is good for you (especially if your preferences are scope-sensitive or not-super-indexical).
2. Some future humans may want to reward people who prosocially made the AI transition go well.
3. 1. BOTEC: 2% of lightcone will be gifted to people who made the AI transition go well. Donating $1 well now gives you 1/60B of the credit for making the AI transition go well, if it does. Therefore donating $1 well now gives you 1/3T of the lightcone, if the AI transition goes well, in expectation. But assume a 2.5x haircut for funging, so 1/8T.
4. Some aliens may want to (acausally) reward people who act prosocially. (Aliens could reward people directly, based on actions of correlated simulated people, or they could (more simply) acausally cooperate with future humans to get future humans to reward prior human prosocial actions.)
5. 1. Or, generally: maybe part of the multiverse-wide acausal cooperation scheme will be rewarding prosocial actions, alongside other incentive-compatibility measures like punishing threats/conflict.
  2. Or, imprecisely: maybe you’re in a simulation such that if you act prosocially, your preferences will be rewarded.
Also as my decision-theory intuitions have improved I’ve come to appreciate this heuristic: when you can benefit others’ preferences super-efficiently and so there would be massive gains from trade but you can’t coordinate with your counterparty, just do your end of the trade anyway. (Even if you don’t directly care about their preferences at all.) (Depending on context.)
(This is all imprecise/slippery; there’s a few different meanings of “prosocial” and this depends on what kind of selfish rewards you care about.)
(h/t @Oscar for discussion.)
(I’m writing this to facilitate asking experts for takes.)

Zach Stein-Perlman 1 May 2026 1:10 UTC
4 points
2
in reply to: Garrett Baker’s comment on: Zach Stein-Perlman’s Shortform
Here I’m exploring separating two parameters: effect of growing the field holding quality-distribution constant and marginal quality vs average quality. (And there’s also value of the field and cost to grow the field by 1%.)

Zach Stein-Perlman 30 Apr 2026 21:42 UTC
2 points
0
in reply to: Thomas Kwa’s comment on: Zach Stein-Perlman’s Shortform
Yeah, separating “quality-adjusted field size --> output” and “output --> impact” might be helpful. But I think “parallelizing” is the wrong frame — it doesn’t account for how larger fields have more shared infrastructure, more synergy, etc. And you have to be careful to account for the “low-hanging fruit plucked first” or “infinite work produces bounded value” phenomenon exactly once.

Zach Stein-Perlman 30 Apr 2026 20:15 UTC
2 points
0
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
I think a good way to elicit people’s views on this is asking: if we doubled/halved the ecosystem, keeping the same distribution of quality, what multiplier on impact would that be? And if someone says “doubling the field multiplies impact by 𝛾” then you infer that increasing the field by ε is slightly better^[1] than multiplying impact by
But I expect people (including me) don’t have great intuitions, especially on #2.
1. ^
  Because as the field grows, returns [diminish faster / eventually diminish].

Zach Stein-Perlman 30 Apr 2026 19:00 UTC
15 points
0
on: Zach Stein-Perlman’s Shortform
Decreasing or increasing returns in the size of the field?
Suppose a random 1% (or 50%) of the AI safety ecosystem quits tomorrow (not including you). This is bad news, but does it make your personal impact higher or lower?^[1]
1. Higher: there’s diminishing returns in the size of the field. With a smaller field, you pluck lower-hanging fruit. If there are N people, all equally skilled, each person produces less than 1/N counterfactual value, since N-1 people would still pluck the low-hanging fruit. (If there’s an upper bound on the value of the field, no matter how large the field is—e.g., solving alignment and making the transition to superintelligence go well—then clearly at some point the field must become sublinear.) Also maybe large fields have issues.
2. Lower: with a larger field, your work improves more people’s work and more other people improve your work. (You discovering that a particular line of work is promising or not is more effective; more other people discover that a particular line of work is promising or not and thus help you; there’s more infrastructure and each piece of infrastructure is more valuable; probably more.) Maybe there’s increasing returns to effort within projects. Maybe there’s synergies between projects or many projects act as multipliers on other projects — maybe there’s positive feedback loops between the field doing lots of good research and new people wanting to join the field, maybe government-policy work depends on technical-safety work, etc.
More potential considerations: Claude (mediocre).
I think it’s generally assumed that there’s diminishing returns because “higher” is a bigger effect. I want a quantitative estimate in the case of AI safety — this is a parameter for quantifying the value of growing the field, as part of my prioritization work.
There’s probably a bunch of economics work on what affects R&D/innovation; see Claude.
(A separate phenomenon is that if you grow the field, on average the new people will be lower-quality than the old people.)
1. ^
  Or: is it less bad or more bad than magically decreasing the impact of the field by 1%/50%?
  And you can consider the opposite question: increase the field by 1%/50%/100%, maintaining the current distribution of people-quality.

Zach Stein-Perlman 30 Apr 2026 17:02 UTC
1 point
1
in reply to: kbear’s comment on: No77e’s Shortform
You missed “trying.” Succeeding requires certain capabilities, but trying to do it does not. I believe there’s much more risk from AIs not trying to do what the operator wants than trying and failing.

Zach Stein-Perlman 29 Apr 2026 20:29 UTC
8 points
0
in reply to: No77e’s comment on: No77e’s Shortform
Disagree for the meaning of “alignment” I most care about, where alignment is about trying to do what the operator wants.

Zach Stein-Perlman 28 Apr 2026 1:46 UTC
10 points
0
in reply to: habryka’s comment on: LessWrong Shows You Social Signals Before the Comment
Yes. If I had to read everything carefully, I would mostly agree with the post’s proposal, but it is crucial that I don’t have to read everything.

Zach Stein-Perlman 28 Apr 2026 0:50 UTC
5 points
2
in reply to: Nathan Helm-Burger’s comment on: Zach Stein-Perlman’s Shortform
Yes. Sorry. Upvoted. Instead of “consensus bad” I should have said “consensus often implemented bad” or “think about your process for resolving disagreement, and in most contexts be very wary of (1) getting hung up on minor disagreements because you don’t have a salient process for moving on, (2) needing inside-view buy-in from everyone, [more]” or something. I should have given central examples of failures, like “you spend half of a meeting talking about something that clearly isn’t worth that, because there’s no clear good way to move on.”

Zach Stein-Perlman 27 Apr 2026 21:29 UTC
4 points
0
in reply to: Drake Morrison’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
Christiano, Greenblatt, Carlsmith, Kokotajlo. (Not sure about all of these people, but the vibe among my friends is definitely that acausal cooperation will be a big deal.)
(I’m not claiming consensus or trying to persuade.)
What links here?
- Mitchell_Porter's comment on Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market by Zach Stein-Perlman (29 Apr 2026 0:03 UTC; 2 points)

Zach Stein-Perlman 27 Apr 2026 18:06 UTC
1 point
0
in reply to: Richard_Kennaway’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
Sorry, I’m busy plus I don’t have amazing things to say here. You could ask a chatbot about the space of possible universes and whether other universes are real.

Zach Stein-Perlman 27 Apr 2026 17:14 UTC
1 point
−4
in reply to: kave’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
Of course, the appeal to authority is just because I’m not going to get into the object level here (and it’s pretty load-bearing for me personally).
And I’m not saying you should bet the farm on acausal cooperation. You can read this whole post as prefaced by “IF acausal cooperation works out.”

Zach Stein-Perlman 27 Apr 2026 16:45 UTC
−4 points
−4
in reply to: StanislavKrym’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
Thanks! Yeah, I didn’t want to get into why I expect acausal cooperation will be feasible. fwiw my impression is that people who’ve thought about it and community elites tend to believe in it.

Zach Stein-Perlman 27 Apr 2026 4:20 UTC
4 points
0
in reply to: Seth Herd’s comment on: Zach Stein-Perlman’s Shortform
Sure, then in practice that’s just democracy-with-vetoes. I’m criticizing actually trying to reach consensus, or where there’s no normal/salient way to move forward given disagreement besides talk until there’s (nominally) consensus.
Oh, I should have included examples. Like, if I’m cowriting a doc, maybe the default is we don’t have a process for moving past disagreements; if neither of us changes their mind, eventually someone acquiesces, but many processes for moving past low-stakes disagreements would be better.

Zach Stein-Perlman 27 Apr 2026 1:15 UTC
2 points
0
on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
I believe this post has received 3-4 strong-downvotes. Downvoters, I’d appreciate if you DMed me why. I know some people are sensitive about infohazards on related topics but I think this post is fine and I’m interested in hearing if not.
Edit: now 4-5, even though it no longer appeared on the homepage; I feel confused about what’s going on.

Zach Stein-Perlman 26 Apr 2026 22:36 UTC
2 points
0
in reply to: Mira Kennard’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
I think you can simulate a bunch of other universes, determine what values are held by people in other universes who are into acausal cooperation (roughly speaking), then cooperate with those values and expect that some people in other universes will cooperate with you perfectly corresponding to your cooperativeness propensity. Like in Newcomb’s problem, you get to choose the output of your decision procedure and that determines both what you do and what good predictors will predict you’ll do. Maybe I misunderstand your point; maybe we don’t disagree.

Zach Stein-Perlman 26 Apr 2026 20:30 UTC
13 points
−1
on: Zach Stein-Perlman’s Shortform
Consensus is the worst way to make decisions.
- Trying to reach consensus is slow and costly
- It favors the status quo and unobjectionable actions
- It makes people more reluctant to voice disagreement, because voicing disagreement is coupled with a costly process of reaching consensus on that topic (whereas if decisions are made by an executive or a vote or something, voicing opinions doesn’t have to slow things down)
- It means when there’s disagreement, decisions are made via one side politely acquiescing or getting exhausted; that’s cursed
Inspired by @habryka, but he may have different reasons.

Zach Stein-Perlman 26 Apr 2026 20:30 UTC
67 points
48
on: Zach Stein-Perlman’s Shortform
LessWrong posts have staying power; google docs do not.
Great LessWrong posts often stick in people’s minds and continue to be reread and shared for years after they are published. It’s very rare for great google docs to do so, even if they’re initially shared with everyone you care about. And even right after they’re written they seem worse at winning hearts and minds. I could speculate about why but here I just want to observe that this seems true. One upshot is that you should often aim for the final output of a project to be a LessWrong post rather than a google doc.
(Also, even if you think you’ve shared your google doc with everyone who should read it, you probably haven’t, in part because some other people should read it in the future.)
(Obviously google docs are better for eliciting input, if you have a good group that will read your docs.)
(Presumably there are some things you can do to make your google docs stick more, including indexing them.)
(h/t @Linch, who said something like this to me.)

Zach Stein-Perlman 26 Apr 2026 20:20 UTC
10 points
8
in reply to: Richard_Kennaway’s comment on: Universes can specialize: Each universe should produce the goods it’s most comparatively advantaged at, relative to the multiversal market
I agree that this depends on people caring about goods in different universes. I care a bunch about goods in different universes and I expect many others will too.
(Actually there might be galaxy-brained decision-theoretic arguments nevertheless, including based on the prospect that we’re in a simulation, but the basic case depends on caring about different universes.)