Researching donation opportunities. Previously: ailabwatch.org.
Zach Stein-Perlman
Here I’m exploring separating two parameters: effect of growing the field holding quality-distribution constant and marginal quality vs average quality. (And there’s also value of the field and cost to grow the field by 1%.)
Yeah, separating “quality-adjusted field size --> output” and “output --> impact” might be helpful. But I think “parallelizing” is the wrong frame — it doesn’t account for how larger fields have more shared infrastructure, more synergy, etc. And you have to be careful to account for the “low-hanging fruit plucked first” or “infinite work produces bounded value” phenomenon exactly once.
I think a good way to elicit people’s views on this is asking: if we doubled/halved the ecosystem, keeping the same distribution of quality, what multiplier on impact would that be? And if someone says “doubling the field multiplies impact by 𝛾” then you infer that increasing the field by ε is slightly better[1] than multiplying impact by
But I expect people (including me) don’t have great intuitions, especially on #2.
- ^
Because as the field grows, returns [diminish faster / eventually diminish].
- ^
Decreasing or increasing returns in the size of the field?
Suppose a random 1% (or 50%) of the AI safety ecosystem quits tomorrow (not including you). This is bad news, but does it make your personal impact higher or lower?[1]
Higher: there’s diminishing returns in the size of the field. With a smaller field, you pluck lower-hanging fruit. If there are N people, all equally skilled, each person produces less than 1/N counterfactual value, since N-1 people would still pluck the low-hanging fruit. (If there’s an upper bound on the value of the field, no matter how large the field is—e.g., solving alignment and making the transition to superintelligence go well—then clearly at some point the field must become sublinear.) Also maybe large fields have issues.
Lower: with a larger field, your work improves more people’s work and more other people improve your work. (You discovering that a particular line of work is promising or not is more effective; more other people discover that a particular line of work is promising or not and thus help you; there’s more infrastructure and each piece of infrastructure is more valuable; probably more.) Maybe there’s increasing returns to effort within projects. Maybe there’s synergies between projects or many projects act as multipliers on other projects — maybe there’s positive feedback loops between the field doing lots of good research and new people wanting to join the field, maybe government-policy work depends on technical-safety work, etc.
More potential considerations: Claude (mediocre).
I think it’s generally assumed that there’s diminishing returns because “higher” is a bigger effect. I want a quantitative estimate in the case of AI safety — this is a parameter for quantifying the value of growing the field, as part of my prioritization work.
There’s probably a bunch of economics work on what affects R&D/innovation; see Claude.
(A separate phenomenon is that if you grow the field, on average the new people will be lower-quality than the old people.)
- ^
Or: is it less bad or more bad than magically decreasing the impact of the field by 1%/50%?
And you can consider the opposite question: increase the field by 1%/50%/100%, maintaining the current distribution of people-quality.
You missed “trying.” Succeeding requires certain capabilities, but trying to do it does not. I believe there’s much more risk from AIs not trying to do what the operator wants than trying and failing.
Disagree for the meaning of “alignment” I most care about, where alignment is about trying to do what the operator wants.
Yes. If I had to read everything carefully, I would mostly agree with the post’s proposal, but it is crucial that I don’t have to read everything.
Yes. Sorry. Upvoted. Instead of “consensus bad” I should have said “consensus often implemented bad” or “think about your process for resolving disagreement, and in most contexts be very wary of (1) getting hung up on minor disagreements because you don’t have a salient process for moving on, (2) needing inside-view buy-in from everyone, [more]” or something. I should have given central examples of failures, like “you spend half of a meeting talking about something that clearly isn’t worth that, because there’s no clear good way to move on.”
Christiano, Greenblatt, Carlsmith, Kokotajlo. (Not sure about all of these people, but the vibe among my friends is definitely that acausal cooperation will be a big deal.)
(I’m not claiming consensus or trying to persuade.)
Sorry, I’m busy plus I don’t have amazing things to say here. You could ask a chatbot about the space of possible universes and whether other universes are real.
Of course, the appeal to authority is just because I’m not going to get into the object level here (and it’s pretty load-bearing for me personally).
And I’m not saying you should bet the farm on acausal cooperation. You can read this whole post as prefaced by “IF acausal cooperation works out.”
Thanks! Yeah, I didn’t want to get into why I expect acausal cooperation will be feasible. fwiw my impression is that people who’ve thought about it and community elites tend to believe in it.
Sure, then in practice that’s just democracy-with-vetoes. I’m criticizing actually trying to reach consensus, or where there’s no normal/salient way to move forward given disagreement besides talk until there’s (nominally) consensus.
Oh, I should have included examples. Like, if I’m cowriting a doc, maybe the default is we don’t have a process for moving past disagreements; if neither of us changes their mind, eventually someone acquiesces, but many processes for moving past low-stakes disagreements would be better.
I believe this post has received 3-4 strong-downvotes. Downvoters, I’d appreciate if you DMed me why. I know some people are sensitive about infohazards on related topics but I think this post is fine and I’m interested in hearing if not.
Edit: now 4-5, even though it no longer appeared on the homepage; I feel confused about what’s going on.
I think you can simulate a bunch of other universes, determine what values are held by people in other universes who are into acausal cooperation (roughly speaking), then cooperate with those values and expect that some people in other universes will cooperate with you perfectly corresponding to your cooperativeness propensity. Like in Newcomb’s problem, you get to choose the output of your decision procedure and that determines both what you do and what good predictors will predict you’ll do. Maybe I misunderstand your point; maybe we don’t disagree.
Consensus is the worst way to make decisions.
Trying to reach consensus is slow and costly
It favors the status quo and unobjectionable actions
It makes people more reluctant to voice disagreement, because voicing disagreement is coupled with a costly process of reaching consensus on that topic (whereas if decisions are made by an executive or a vote or something, voicing opinions doesn’t have to slow things down)
It means when there’s disagreement, decisions are made via one side politely acquiescing or getting exhausted; that’s cursed
Inspired by @habryka, but he may have different reasons.
LessWrong posts have staying power; google docs do not.
Great LessWrong posts often stick in people’s minds and continue to be reread and shared for years after they are published. It’s very rare for great google docs to do so, even if they’re initially shared with everyone you care about. And even right after they’re written they seem worse at winning hearts and minds. I could speculate about why but here I just want to observe that this seems true. One upshot is that you should often aim for the final output of a project to be a LessWrong post rather than a google doc.
(Also, even if you think you’ve shared your google doc with everyone who should read it, you probably haven’t, in part because some other people should read it in the future.)
(Obviously google docs are better for eliciting input, if you have a good group that will read your docs.)
(Presumably there are some things you can do to make your google docs stick more, including indexing them.)
(h/t @Linch, who said something like this to me.)
I agree that this depends on people caring about goods in different universes. I care a bunch about goods in different universes and I expect many others will too.
(Actually there might be galaxy-brained decision-theoretic arguments nevertheless, including based on the prospect that we’re in a simulation, but the basic case depends on caring about different universes.)
Buying AI labor might be a big deal for philanthropists.
I think the total available for AI safety philanthropy is almost $100B (at current valuations), mostly from Anthropic.[1] The AI safety nonprofit ecosystem currently consumes about $1B per year. There are still good opportunities available, but they’re several times worse than the average spending (because the low-hanging fruit has been plucked[2]). Marginal effectiveness would likely decline by ~half again if you doubled the rate of AI safety philanthropy.
So there’s likely ~30x more money than can be spent on funding AI safety orgs.[3] A priori I expected there would be various decent ways to spend large amounts of money, but I’m aware of few promising proposals.
There are two obvious buckets that might be able to absorb ~unlimited amounts of money well:
During an intelligence explosion, buying AI inference to do AI safety work
After an intelligence explosion, if (a) pre-superintelligence property rights persist and (b) you can turn wealth into control of distant galaxies, buy control of distant galaxies
2 looks worse than AI safety philanthropy on current margins (even if you can invest so well that your investments grow by 100x as a fraction of global wealth by superintelligence-time in expectation) — you shouldn’t save for it on current margins, but it could become competitive if AI safety philanthropy increases and after an intelligence explosion there may be nothing else for altruists to spend money on. A collaborator and I hope to publish our analysis on this topic in May.
1 is very uncertain. Even if buying AI labor is important, that doesn’t necessarily mean we can spend present-value $10B+ effectively. Some people are trying to investigate this topic.
Regardless, one upshot for philanthropists is to spend more now — assuming funding will increase in the future and you’ll receive or be highly correlated with that, spending now is better than later or never.[4]
(Reminder: American donors can do better by donating to politics. This is just about nonprofits.)
(Claim: you can get expected returns of >100%/year by investing well. This isn’t load-bearing for any of the above.)
- ^
I think about $100B. Another reasonable person thinks about $40B. We haven’t argued about it.
- ^
There’s two phenomena here: (a) there’s diminishing returns in people/org quality and (b) there’s diminishing returns in projects — even if everyone is equally skilled, going from 0 to 100 people is better than 900 to 1000 because the first people cause more important problems to be worked on.
- ^
Community funds will be invested decently, in aggregate, until they are spent, so $1B/year for 8 years only costs like $2B now.
- ^
Anthropic investors, staff, and founders aren’t able to sell their equity at will, and they likely won’t be able to until 6 months after IPO. I expect the AI safety philanthropy flow will increase from $1B/year to more like $2B/year by the time Anthropic equity becomes liquid — maybe before then as other funders plan for Anthropic money. And even without Anthropic money, in Good Ventures’s shoes I would want to spend faster.
Three mechanisms by which prosocial actions could be selfishly rewarded:
Correlation. You acting prosocially is correlated with aliens in other universes (and other humans) acting prosocially, which is good for you (especially if your preferences are scope-sensitive or not-super-indexical).
Some future humans may want to reward people who prosocially made the AI transition go well.
BOTEC: 2% of lightcone will be gifted to people who made the AI transition go well. Donating $1 well now gives you 1/60B of the credit for making the AI transition go well, if it does. Therefore donating $1 well now gives you 1/3T of the lightcone, if the AI transition goes well, in expectation. But assume a 2.5x haircut for funging, so 1/8T.
Some aliens may want to (acausally) reward people who act prosocially. (Aliens could reward people directly, based on actions of correlated simulated people, or they could (more simply) acausally cooperate with future humans to get future humans to reward prior human prosocial actions.)
Or, generally: maybe part of the multiverse-wide acausal cooperation scheme will be rewarding prosocial actions, alongside other incentive-compatibility measures like punishing threats/conflict.
Or, imprecisely: maybe you’re in a simulation such that if you act prosocially, your preferences will be rewarded.
Also as my decision-theory intuitions have improved I’ve come to appreciate this heuristic: when you can benefit others’ preferences super-efficiently and so there would be massive gains from trade but you can’t coordinate with your counterparty, just do your end of the trade anyway. (Even if you don’t directly care about their preferences at all.) (Depending on context.)
(This is all imprecise/slippery; there’s a few different meanings of “prosocial” and this depends on what kind of selfish rewards you care about.)
(h/t @Oscar for discussion.)
(I’m writing this to facilitate asking experts for takes.)