lc comments on My pitch for the AI Village

lc 24 Jun 2025 21:10 UTC
51 points
45
Four million a year seems like a lot of money to spend on what is essentially a good capabilities benchmark. I would rather give that to like, LessWrong, and if I had the time to do some research I could probably find 10 people willing to create benchmarks for alignment that I think would be even more positively impactful than a lesswrong donation (like https://scale.com/leaderboard/mask or https://scale.com/leaderboard/fortress)
- Daniel Kokotajlo 25 Jun 2025 16:46 UTC
  35 points
  4
  Parent
  - I’ve also donated to LW and will probably continue to do so; I’ve also donated to more traditional AI alignment research.
  - $4M/yr is what I wildly guess a community spending on the order of 100M/yr should be putting into agent village. So, just a couple percent. I am not saying we shouldn’t fund anything else until this hits $4M/yr; my own wild guess about when I personally would stop funding agent village and switch marginal donations to something else would maybe be more like after they have $400k/yr for compute.
  - The two benchmarks you list seem cool but less valuable than agent village to me. They don’t seem to help with the difficult parts of the alignment problem, e.g. once you have AGIs which are very situationally aware and have memorized all the rules and behaviors you want them to follow, and you are trying to figure out whether the reason they are acing all your tests is because they are virtuous, or because they are faking it, and if they are virtuous, whether their virtue is going to be robust to future distribution shifts.