Alignment Fellowship
Unconditional Grants to Worthy Individuals Are GreatThe process of applying for grants, raising money, and justifying your existence sucks.
A lot.
It especially sucks for many of the creatives and nerds that do a lot of the best work.
If you have to periodically go through this process, and are forced to continuously worry about making your work legible and how others will judge it, that will substantially hurt your true productivity. At best it is a constant distraction. By default, it is a severe warping effect. A version of this phenomenon is doing huge damage to academic science.
Compelled by this, I’m considering funding ~three people for two years each to work on whatever they see fit, much like the the Thiel Fellowship, but with an AI Alignment angle.
I want to find people who are excited to work on existential risk, but are currently spending much of their time working on something else due to financial reasons.
Instead of delegating the choice to some set of grant makers, I think that aggregating the opinion of the crowd could work better (at least as good at finding talent, but with less overall time spent)
The best system I can think of at the moment would be to give every member of the alignment forum one vote with the ability to delegate it. Let everybody nominate any person in the world, including themselves, and award grants to the top 3.
I’m asking for feedback and advice:
Is this a good idea in general, or maybe it’s obviously an order of magnitude worse than existing opportunities?
Maybe the list of perfect candidates already exists, waiting to get funded?
What should be the amount? Thiel gave 200k. Is it too much for 2 years? Too little?
How could a nomination and voting system be improved? And especially, who should get a vote? Should it be Alignment Forum members, or users registered before a certain date, or Lesswrong users?
Could there be an entirely different approach to finding fellows? How would you do it?
Having done a bunch of this, yes, great idea. You can have pretty spectacular impact, because the motivation boost and arc of “someone believes in me” is much more powerful than the one you get from funding stress.
My read is that good-taste grants of this type are dramatically, dramatically more impactful than those by larger grantmakers, e.g. I proactively found and funded the upskilling grant of a math PhD who found glitch tokens, which was for a while the third most upvoted research on the alignment forum. This cost $12k for I think one year of upskilling, as frugal geniuses are not that rare if you hang out in the right places.
However! I don’t think that your proposed selection mechanism is much good. It replaces applications with promotion, and will cause lots of researchers who don’t get funded to spend cycles or be tugged around by campaigns, and your final winners will be hit by goodhart’s curse. Also, this depends on the average AF participant not just being good at research, but at judging who will do good research.
I do think it’d be net positive, but I think you can do a lot better
If you’re doing a mechanism rather than concentrated agency, @the gears to ascension’s proposal seems much more promising to me as it relies much more on high-trust researchers rather than lots of distributed less informed votes.
The other angles I see are:
Make another funder like AISTOF. This is imo the best funder in the space, far better grantee experience, up with the best in terms of taste. It works by a donor selecting one high agency person they trust (JueYan, a VC) and giving them a remit to find grantees fitting a profile, then mostly not intervening and just getting regular reports for how funds are spend to help them judge how much to add. I imagine there’s someone in your network who you’d trust to track down and assess people much better than a popularity contest (though they might still contact top researchers for takes on technical details).
Make a somewhat more organized fellowship, like the one @Mateusz Bagiński has a sketch for around understanding, explaining, and solving the hard problems in alignment, with many of the people being directly invited and some extra infrastructure being provided.
Select people directly, based on your own reading and observations.
I have a list of people I’m excited about! And proactively gardened projects with founders lined up too.[1] Happy to talk if you’re interested in double-clicking on any of these, booking link DMed.
I recommend less, spread over more people, though case-by-case is OK. Probably something like $75k a year gets the vast majority of the benefit, but you can have a step where you ask their current salary and use that as an anchor. Alternatively, I think there’s strong benefit to giving many people a minimal safety net. Being able to call on even $20-25k/year for 3 years would be a vast weight off many people’s shoulders, if you’re somewhat careful and live outside a hub it’s entirely possible to do great work on a shoestring, and this actually provides some useful filters.
I have spent down the vast majority of my funds over the last 5 years so can’t actually support anyone other than the smallest grants without risking running out of money before the world ends and needing to do something other than full time trying to save the world.
My takes for the qs in order,
Good idea: Probably yes if done well
Existing list: I suspect none really good now
How much: No strong opinion; perhaps ask each, “How much of the total shared budget do you need to work unimpeded? Prefer less but pick enough to work well. We hope to fund 3 to 6 people for two years, and have selected the following list of budget-gated grantees: [list goes here]”—this could have weird social effects but the positive weird effects seem maybe worth it to me.
Voting/other approach: I’d worry about plain popularity contest on alignmentforum due to not giving tools to explicitly hedge over opinion clusters.
I’d suggest: take fixed point of asking a naturally-transitive person-taste-rating question and a non-transitive research-taste-rating question:
> Q1 (transitive): “Hello A. Please rate B on peer taste by giving upper and lower bounds on P(C has [good taste in doom-reducing][1] people, strategies, and research | B rates C as a 1 on this question), as well as any freeform commentary you have on the bounds you gave”.
> Q2 (terminal): “Hello A. Please rate B on agenda taste by giving upper and lower bounds on P(doom by 2200|B’s research agenda continues)/P(doom by 2200|B’s research agenda stops), as well as any freeform commentary you have on the bounds you gave.”
Propagating ratings like Q1 is also known as Eigentrust[2] (or perhaps here we could call it Eigentaste); doing this allows focusing on all-factors doom, makes cliques visible but doesn’t favor them, and easily puts significant weight on newbies who someone well-known is impressed by.
I’d make it invite-based—starting ratings highly influence transitive ratings, I want to trace trust graph of well known researchers, out to lesser-known researchers. Start from initial highly-skilled people across a range of alignment-difficulty and intervention-approach opinions; ask them to invite the people they consider high competence to come rate and be rated in this poll.
I’d also want to make it public, so that others can use the results—eg, ask participants to select from dropdown “are you ok with this review being made available to {the internet | other participants in this review process | just the person I’m reviewing and the review admins | just the review admins | nobody please delete my review}”.
I’ve prototyped this with airtable + fillout forms + google cloud run. That stack doesn’t work great because fillout forms is clunky UI for many-to-many; airtable forms couldn’t do many2many at all, but I’d be open to build a better UI as a volunteer and then either run this or hand over code for you to run this. Claude is now able to spit out a working eigentrust impl zero-shot, so it’s mostly a setup question to do anything more, should be a few hours of Claude Code. I’ll see what happens this weekend.
Original motivation: A few months ago, I was drawn to this for making common knowledge of which groups of people cross-rate highly; I suspect most people give pretty low scores for most people, and there will be a few peaks of actual relevance. I’d guess a force-directed layout of the rating review graph would show those clusters, for example. Ideally it would also allow people to select their own taste ratings as a starting set and see what the resulting transitive distribution is. That said, maybe making it public is less important now—with high quality disagreement-bridging posts like Steven Byrnes’ 6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa, the urgency of achieving common knowledge of who’s competent according to what camp seems slightly less. still significant, though, and I think the value of making it public is in general pretty high.
Possible drawbacks: It would be some effort for voters to fill this out even if the UI was nice, but my hunch is that the output will be enough better than single-round unnormalized voting to be worth it. And it might not be much better to do this than just ask alignmentforum, in which case it’d be effort for not much value.
The bracketed part needs refinement—I’d want this to be precise enough that god could resolve a prediction market about it. Needs to connect to both transitive Q1 and also to Q2, and be a clear enough question to be answerable by a skilled alignment researcher.
Claude intro and critique of my proposal with your post and my comment as context; Eigentrust (wikipedia); Eigentrust paper
Eigentrust sounds awesome. I was really excited about it when first discovered, but thought that people won’t have a good reason to fill out their trust weights. Grant money allocation could be a perfect motivator.
I wonder if it’s good to pre-fill the trust weights (e.g. based on AF upvotes history), to make it easier for users (and motivate those who strongly disagree with their defaults)
Thank you for offering to volunteer with this, I’ll definitely reach out if I decide to run the fellowship with Eigentrust.
I’ve made solid progress on putting together a site I’d be happy with, not quite there yet, eta another 24 to 48h. Let me know if you end up not wanting to couple your donations and eigentaste; there’s still clear value in running it, so I’m going to get it done regardless.
Sent you a demo. Reasonably close to ready for real use by motivated users, but the human-facing prompting still need refining in order to be properly meaningful.
Yeah, I notice that using a transitive quality as the endorsement criterion, and making votes public, produces an incentive for a person to give useful endorsements: Failing to issue informative endorsements would indicate them as not having this transitive quality and so not being worthy of endorsement themselves.
We can also make it prominent in a person’s profile if, for instance, they’ve strongly endorsed themselves, or if they’ve only endorsed a few people without also doing any abstention endorsements (which redistribute trust back to the current distribution). Some will have an excuse for doing this, most will be able to do better.
True. Doing that by default, and also doing some of the aforementioned abstention endorsements by default, would address accidental overconfident votes pretty well.
(Also, howdy, I should probably help with this, I was R&Ding web of trust systems for a while before realising there didn’t seem to be healthy enough hosts for them (they can misbehave if placed in the wrong situations), so I switched to working on extensible social software/forums, to build better hosts. It wasn’t clear to me that the alignment community needed this kind of thing, but I guess it probably does at this point.)
One obvious problem is that this turns getting funded into a popularity contest, which makes Goodhart kick in. It might work fine as a one-off thing, but in the long run, it will predictably get gamed, and will likely have negative effects on the whole LW discussion ecosystem by setting up perverse incentives for engaging with it (and, unless the list of eligible people is frozen forever, attracting new people who are only interested in promoting themselves to get money).
You should almost certainly have some mechanism for deciding the amount to pay on a case-by-case basis, rather than having it be flat.
What I would want to experiment with is using prediction markets to “amplify” the judgement of well-known people with unusually good AGI Ruin models who are otherwise too busy to review thousands of mostly-terrible-by-their-lights proposals (e. g., Eliezer or John Wentworth). Fund the top N proposals the market expects the “amplified individual” to consider most promising, subject to their veto.
This would be notably harder to game than a straightforward popularity contest, especially if the amplifee is high-percentile disagreeable (as my suggested picks are).
This would solve the bandwidth problem but doubles down on the correlation problem. if you peg the market to the approval of a few “amplified individuals”, you aren’t actually funding “alignment”, you are funding “simulations” of Eliezer/John. If their models have blind spots, the market will efficiently punish anyone trying to explore those blind spots.
200k is pretty high. A higher salary can increase the number of applicantions, but it also increases the number of applications you’d need to filter through.
Maybe go more meta, and instead pay someone whose full-time job will be to find and interview people who want to work on AI Alignment, and do the paperwork (applying for other grants) for them.
I think this kind of funding has outsized impact.
Voting or otherwise delegating selection to the hive mind seems like a good way to minimize any potential impact.
Delegating to an already successful, widely respected authority in the domain is better, but those people are probably already steering most of the effort and funding in the field, either directly or by swaying general opinion.
Nomination-based seems like a good way to tap the wisdom of many different highly qualified people without filtering through a consensus process. It also reduces opportunity for gaming by insincere candidates, reduces the number of people who will spend time applying, and limits the number of applications you have to read.
For example: pick a nominating committee of maybe 10 ppl you think are wise and smart and knowledgeable and independent-thinking and different from each other, who would not be candidates now, but would be in a position to be aware of potential candidates. Ask each of them to nominate one candidate per available slot with a brief statement about why. The nominators’ identities should be secret, even to each other. Your goals and criteria should be articulated to them.
You could either screen those based on the nominator’s statements or invite short preliminary applications from all the nominees, but decide which of them you are personally most excited about and invite only 2-3 full applications or interviews per available slot. In the end pick recipients based on your own judgement.
I appreciate your thoughts.
I buy the reasoning that “delegating selection to the hive” could be suboptimal.
Also, as you have pointed out, the very best of the hive already have budgets to distribute or don’t have spare time for this for other reasons.
Your exact proposal though implies that I can pick 10 wise and smart people (which is somewhat manageable, but I’d be still mostly deferring to a consensus opinion), and that I can make a final pick (which I most certainly can’t, besides doing a “vibe-check”)
I like the idea to make nominators secret to each other, to minimize the influence of social dynamics.
Good idea—I advise a higher amount, spread over more people. Up to 8.
Did you mean lower amount?