Thane Ruthenis comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Thane Ruthenis 29 Dec 2024 20:03 UTC
6 points
0
Hm. Eliezer has frequently complained that the field has no recognition function for good research he’s satisfied with besides “he personally looks at the research and passes his judgement”, and that this obviously doesn’t scale.
Stupid idea: Set up a grantmaker that funds proposals based on a prediction market tasked with evaluating how likely Eliezer/Nate/John is to approve of a given research project. Each round, after the funding is assigned to the highest-credence projects, Eliezer/Nate/John evaluate a random subset of proposals to provide a ground-truth signal; the corresponding prediction markets pay out, the others resolve N/A.
This should effectively train a reward function that emulates the judges’ judgements in a scalable way.
Is there an obvious reason this doesn’t work? (One possible issue is the amount of capital that’d need to be frozen in those markets by market participants, but we can e. g. upscale the effective amounts of money each participant has as some multiple of the actual dollars invested, based on how many of their bets are likely to actually pay out.)
- ryan_greenblatt 29 Dec 2024 20:31 UTC
  16 points
  6
  Parent
  Some notes:
  - I don’t think this is the actual bottleneck here. Noteably, Eliezer, Nate, and John don’t spend much of any of their time assessing research at all (at least recently) as far as I can tell.
  - I don’t think a public market will add much information. Probably better to just have grantmakers with more context forecast and see how well they do. You need faster feedback loops than 1 yr to get anywhere though, but you can do this by practicing on a bunch of already done research.
  - My current view is that more of the bottleneck in grantmaking is not having good stuff to fund rather than grantmakers not funding stuff, though I do still think the Open Phil should fund notably more aggressively than they currently do, that marginal LTFF dollars look great, and that it’s bad that Open Phil was substantially restricted in what they can fund recently (which I expect to have substantial chilling effects in addition to those areas).
  - Thane Ruthenis 29 Dec 2024 20:56 UTC
    9 points
    4
    Parent
    Noteably, Eliezer, Nate, and John don’t spend much of any of their time assessing research at all (at least recently) as far as I can tell.
    Perhaps not specific research projects, but they’ve communicated a lot regarding their models of what types of research are good/bad. (See e. g. Eliezer’s list of lethalities, John’s Why Not Just… sequence, this post of Nate’s.)
    I would assume this is because this doesn’t scale and their reviews are not, in any given instance, the ultimate deciding factor regarding what people do or what gets funded. Spending time evaluating specific research proposals is therefore cost-inefficient compared to reviewing general research trends/themes.
    My current view is that more of the bottleneck in grantmaking is not having good stuff to fund
    Because no entity that I know of is currently explicitly asking for proposals that Eliezer/Nate/John would fund. Why would people bother coming up with such proposals in these circumstances? The system explicitly doesn’t select for it.
    I expect that if there were an actual explicit financial pressure to goodhart to their preferences, much more research proposals that successfully do so would be around.