Has anyone thought of putting a prize on the alignment problem? I imagine the application form would be swamped by cranks/LLM slop but that feels solvable (e.g. requiring a small fee for each submission to pay the evaluator, requiring >1k LW karma).
Andrii Vasylenko
Very true.
In my own experience, the feeling of urgency actively detracts from useful AI alignment work. Instead of forward-chaining towards a better understanding of intelligence/corrigibility/how-minds-work/etc you end up back-chaining from any kind of results you imagine wold maybe do something.
Which isn’t at all the right frame for approaching the problem, since the effectiveness of what you get is limited by your first thought about it.
On a related note, I think the overwhelming majority of alignment work isn’t helping to address the core problems. Even great researchers like John Wentsworth can somewhat miss the point.
(edited: grammar fix)
I don’t think money is the biggest problem here. MIRI, ControlAI, etc are working on budgets of millions of dollars a year (a lot if used well; Lightcone is one of the most influential orgs around, at a budget of 2-3M), but it doesn’t feel like an effective pause treaty is on the horizon (25% chance that it happens in 1 year).
I’ve never thoroughly researched or worked in policy but it feels like something else is the main bottleneck. Maybe important people not viscerally understanding the situation? I don’t feel like ad campaigns would help with that much.
ControlAI seems to doing good on the margin but it doesn’t seem worth allocating on the order of 50M to it.
What does an agent which has “only instrumental goals” mean? What is it optimizing for? It doesn’t seem to me like a proposal specific enough to consider whether an ASI optimizing for it would lead to good outcomes, let alone being able to build an ASI with that goal (which seems to be the harder part of the problem). If you have a more specific idea for this though then I’d like to see it.
This works quite well, and I will continue using it for the time being. However, I strongly suggest disabling “Fallacy Check” though, since it very often fires on nonfallacious content.
I meant that there are diminishing returns to more funding; the best opportunities for turning money into impact can be much more impactful than the median opportunity.
Also, as you said in the post, there’s much less of a tug-of-war dynamic going on than with a politically polarized issue, so money goes somewhat farther.