Thane Ruthenis comments on Alignment Fellowship

Thane Ruthenis 24 Dec 2025 4:18 UTC
13 points
5
One obvious problem is that this turns getting funded into a popularity contest, which makes Goodhart kick in. It might work fine as a one-off thing, but in the long run, it will predictably get gamed, and will likely have negative effects on the whole LW discussion ecosystem by setting up perverse incentives for engaging with it (and, unless the list of eligible people is frozen forever, attracting new people who are only interested in promoting themselves to get money).
What should be the amount? Thiel gave 200k. Is it too much for 2 years? Too little?
You should almost certainly have some mechanism for deciding the amount to pay on a case-by-case basis, rather than having it be flat.
Could there be an entirely different approach to finding fellows? How would you do it?
What I would want to experiment with is using prediction markets to “amplify” the judgement of well-known people with unusually good AGI Ruin models who are otherwise too busy to review thousands of mostly-terrible-by-their-lights proposals (e. g., Eliezer or John Wentworth). Fund the top N proposals the market expects the “amplified individual” to consider most promising, subject to their veto.
This would be notably harder to game than a straightforward popularity contest, especially if the amplifee is high-percentile disagreeable (as my suggested picks are).
- jaco-bro 24 Dec 2025 7:23 UTC
  3 points
  0
  Parent
  What I would want to experiment with is using prediction markets to “amplify” the judgement of well-known people with unusually good AGI Ruin models who are otherwise too busy to review thousands of mostly-terrible-by-their-lights proposals (e. g., Eliezer or John Wentworth). Fund the top N proposals the market expects the “amplified individual” to consider most promising, subject to their veto.
  This would solve the bandwidth problem but doubles down on the correlation problem. if you peg the market to the approval of a few “amplified individuals”, you aren’t actually funding “alignment”, you are funding “simulations” of Eliezer/John. If their models have blind spots, the market will efficiently punish anyone trying to explore those blind spots.