Four million a year seems like a lot of money to spend on what is essentially a good capabilities benchmark. I would rather give that to like, LessWrong, and if I had the time to do some research I could probably find 10 people willing to create benchmarks for alignment that I think would be even more positively impactful than a lesswrong donation (like https://scale.com/leaderboard/mask or https://scale.com/leaderboard/fortress)
I’ve also donated to LW and will probably continue to do so; I’ve also donated to more traditional AI alignment research.
$4M/yr is what I wildly guess a community spending on the order of 100M/yr should be putting into agent village. So, just a couple percent. I am not saying we shouldn’t fund anything else until this hits $4M/yr; my own wild guess about when I personally would stop funding agent village and switch marginal donations to something else would maybe be more like after they have $400k/yr for compute.
The two benchmarks you list seem cool but less valuable than agent village to me. They don’t seem to help with the difficult parts of the alignment problem, e.g. once you have AGIs which are very situationally aware and have memorized all the rules and behaviors you want them to follow, and you are trying to figure out whether the reason they are acing all your tests is because they are virtuous, or because they are faking it, and if they are virtuous, whether their virtue is going to be robust to future distribution shifts.
Four million a year seems like a lot of money to spend on what is essentially a good capabilities benchmark. I would rather give that to like, LessWrong, and if I had the time to do some research I could probably find 10 people willing to create benchmarks for alignment that I think would be even more positively impactful than a lesswrong donation (like https://scale.com/leaderboard/mask or https://scale.com/leaderboard/fortress)
I’ve also donated to LW and will probably continue to do so; I’ve also donated to more traditional AI alignment research.
$4M/yr is what I wildly guess a community spending on the order of 100M/yr should be putting into agent village. So, just a couple percent. I am not saying we shouldn’t fund anything else until this hits $4M/yr; my own wild guess about when I personally would stop funding agent village and switch marginal donations to something else would maybe be more like after they have $400k/yr for compute.
The two benchmarks you list seem cool but less valuable than agent village to me. They don’t seem to help with the difficult parts of the alignment problem, e.g. once you have AGIs which are very situationally aware and have memorized all the rules and behaviors you want them to follow, and you are trying to figure out whether the reason they are acing all your tests is because they are virtuous, or because they are faking it, and if they are virtuous, whether their virtue is going to be robust to future distribution shifts.