Researching donation opportunities. Previously: ailabwatch.org.
Zach Stein-Perlman
Yayyy!
(based on the summary — haven’t read the paper)
Adverse selection might be pretty brutal. And you can’t get much signal quickly; there would be tons of noise. (But it could still be worthwhile if the upside is high, idk.)
Good post.
the data on ads seems pretty clear
Citation needed. I’m not aware of good data.
To a first approximation, I’m into maximizing EV; I’m only into maxipok insofar as a good heuristic for maximizing EV. Portfolio allocation depends on tractability, but yeah I’m excited about work like https://www.forethought.org/research/viatopia.
the ~zero mode is distributed over +/-10^-10% optimal futures or even +/-10^-20% optimal futures, and the space inbetween from 10^-10% to 0.1% optimal futures has lower cumulative probability mass than either mode. So the EV of the future is basically equal to the probability of getting a future in the high value mode times the average value of a future in that mode.
I agree with this. I meant to say: a strong version of bimodality would imply that the absolute variance within high-value futures is much smaller than the variance between high-value and zero-value, and so nobody should work on moving us between different high-value futures. But that seems false.
Lol, I no longer endorse it! My current view is:
I still feel like almost all goods are ~0 terminal value because you get OOMs more goodness by optimizing hard, and we’re unlikely to produce terminal goods at large scale that are 5-95% of maximum value. (I think some experts agree and some disagree, but not sure.)
That’s before accounting for acausal considerations. I think people getting acausal stuff right but having wrong values is a very plausible way to get ~50% of max value.
And I now think compromise is very plausible, where the lightcone is filled with several different goods (or, if acausal stuff works out, our lightcone is filled with goods-that-people-in-other-universes-like but we’re trading for several different goods in their universes).
The binary view has been argued against by MacAskill and Assadi; I haven’t read that.
No, OpenAI and Anthropic staff currently generally cannot sell their equity. This will change after they go public. Some staff would donate more—not to mention diversify—if they could.
No, OpenAI and Anthropic staff currently generally cannot sell their equity. This will change after they go public. Some staff would donate more—not to mention diversify—if they could.
I disagree with “end of year-ish.” I expect we’ll really feel the effects of Anthropic founder/staff/investor philanthropy (but less in some ways than FTX) when those people are allowed to sell their shares.[1] But my median for that is mid-2027, and there might be delays to donating.
- ^
I think about 8% of current Anthropic shares will ultimately be available for AI safety philanthropy. Anthropic is currently valued at $1T and will probably be valued at substantially more in a year. So that’s present-value $80B and more in the future, whereas total AI safety philanthropy is <$2B/year.
- ^
Observation: Opus 4.7 (via chat) is near-useless for helping me iterate on google docs. For one, its reading comprehension seems very poor.
This is surprising because others say that the models give them tons of uplift. (And the main METR graph is just coding but still. And Opus 4.6 beat all humans on the GovAI work test. And the vibes!)
Hypotheses (not exclusive or exhaustive):
Claude Code and Cowork are much better, even for tasks like “iterate on a google doc”
Other models are much better for work like mine
I’m prompting the model terribly (or it’s better at reading different file formats or something)
Other people are wrong, at least for work like mine; the models fool them with stuff that’s subtly slop
(I am unusually good at noticing when suggestions are garbage)
Other people do legitimately get uplift, but I’m different in ways that make me not
Actually I incorrectly perceived the vibes; most people don’t think they can get substantial uplift on tasks like mine
Anyway, the obvious things to do are:
Find someone who does work like mine and claims to get lots of uplift, and watch them use the AIs. Or get them to walk me through how to use the models to iterate on a google doc.
Check for blogposts about getting use out of AIs on tasks like mine.
I feel scared about this phenomenon. More generally I feel scared because in recent months the vibe from several Anthropic safety people has shifted toward “alignment is easy” or at least “several other problems are similarly important for making the transition go well” and I don’t know why and I’m worried it’s unjustified/random and this is (1) directly bad and (2) a bad sign about intra-Anthropic epistemics.
You misunderstand. It would be bad to only make the max-alignment model or to use that model in internal deployments. This shortform is about experiments.
Three mechanisms by which prosocial actions could be selfishly rewarded:
Correlation. You acting prosocially is correlated with aliens in other universes (and other humans) acting prosocially, which is good for you (especially if your preferences are scope-sensitive or not-super-indexical).
Some future humans may want to reward people who prosocially made the AI transition go well.
BOTEC: 2% of lightcone will be gifted to people who made the AI transition go well. Donating $1 well now gives you 1/60B of the credit for making the AI transition go well, if it does. Therefore donating $1 well now gives you 1/3T of the lightcone, if the AI transition goes well, in expectation. But assume a 2.5x haircut for funging, so 1/8T.
Some aliens may want to (acausally) reward people who act prosocially. (Aliens could reward people directly, based on actions of correlated simulated people, or they could (more simply) acausally cooperate with future humans to get future humans to reward prior human prosocial actions.)
Or, generally: maybe part of the multiverse-wide acausal cooperation scheme will be rewarding prosocial actions, alongside other incentive-compatibility measures like punishing threats/conflict.
Or, imprecisely: maybe you’re in a simulation such that if you act prosocially, your preferences will be rewarded.
Also as my decision-theory intuitions have improved I’ve come to appreciate this heuristic: when you can benefit others’ preferences super-efficiently and so there would be massive gains from trade but you can’t coordinate with your counterparty, just do your end of the trade anyway. (Even if you don’t directly care about their preferences at all.) (Depending on context.)
(This is all imprecise/slippery; there’s a few different meanings of “prosocial” and this depends on what kind of selfish rewards you care about.)
(h/t @Oscar for discussion.)
(I’m writing this to facilitate asking experts for takes.)
Here I’m exploring separating two parameters: effect of growing the field holding quality-distribution constant and marginal quality vs average quality. (And there’s also value of the field and cost to grow the field by 1%.)
Yeah, separating “quality-adjusted field size --> output” and “output --> impact” might be helpful. But I think “parallelizing” is the wrong frame — it doesn’t account for how larger fields have more shared infrastructure, more synergy, etc. And you have to be careful to account for the “low-hanging fruit plucked first” or “infinite work produces bounded value” phenomenon exactly once.
I think a good way to elicit people’s views on this is asking: if we doubled/halved the ecosystem, keeping the same distribution of quality, what multiplier on impact would that be? And if someone says “doubling the field multiplies impact by 𝛾” then you infer that increasing the field by ε is slightly better[1] than multiplying impact by
But I expect people (including me) don’t have great intuitions, especially on #2.
- ^
Because as the field grows, returns [diminish faster / eventually diminish].
- ^
Decreasing or increasing returns in the size of the field?
Suppose a random 1% (or 50%) of the AI safety ecosystem quits tomorrow (not including you). This is bad news, but does it make your personal impact higher or lower?[1]
Higher: there’s diminishing returns in the size of the field. With a smaller field, you pluck lower-hanging fruit. If there are N people, all equally skilled, each person produces less than 1/N counterfactual value, since N-1 people would still pluck the low-hanging fruit. (If there’s an upper bound on the value of the field, no matter how large the field is—e.g., solving alignment and making the transition to superintelligence go well—then clearly at some point the field must become sublinear.) Also maybe large fields have issues.
Lower: with a larger field, your work improves more people’s work and more other people improve your work. (You discovering that a particular line of work is promising or not is more effective; more other people discover that a particular line of work is promising or not and thus help you; there’s more infrastructure and each piece of infrastructure is more valuable; probably more.) Maybe there’s increasing returns to effort within projects. Maybe there’s synergies between projects or many projects act as multipliers on other projects — maybe there’s positive feedback loops between the field doing lots of good research and new people wanting to join the field, maybe government-policy work depends on technical-safety work, etc.
More potential considerations: Claude (mediocre).
I think it’s generally assumed that there’s diminishing returns because “higher” is a bigger effect. I want a quantitative estimate in the case of AI safety — this is a parameter for quantifying the value of growing the field, as part of my prioritization work.
There’s probably a bunch of economics work on what affects R&D/innovation; see Claude.
(A separate phenomenon is that if you grow the field, on average the new people will be lower-quality than the old people.)
- ^
Or: is it less bad or more bad than magically decreasing the impact of the field by 1%/50%?
And you can consider the opposite question: increase the field by 1%/50%/100%, maintaining the current distribution of people-quality.
You missed “trying.” Succeeding requires certain capabilities, but trying to do it does not. I believe there’s much more risk from AIs not trying to do what the operator wants than trying and failing.
Joyce lost the primary 16-84. It’s hard for a Dem challenger to beat a Dem incumbent, especially without a stellar resume or establishment support.
“36cts per vote” is not accurate; Joyce would have gotten almost as many votes if she’d spent $0. “$350K probably does moves her chances of winning by 10%” is big if true but I (tentatively, with little info) think it’s <<1%. I wish her well but I don’t think this is a promising donation opportunity on the basis of flipping the election.