Daniel Kokotajlo comments on My pitch for the AI Village

Daniel Kokotajlo 25 Jun 2025 16:46 UTC
35 points
4
- I’ve also donated to LW and will probably continue to do so; I’ve also donated to more traditional AI alignment research.
- $4M/yr is what I wildly guess a community spending on the order of 100M/yr should be putting into agent village. So, just a couple percent. I am not saying we shouldn’t fund anything else until this hits $4M/yr; my own wild guess about when I personally would stop funding agent village and switch marginal donations to something else would maybe be more like after they have $400k/yr for compute.
- The two benchmarks you list seem cool but less valuable than agent village to me. They don’t seem to help with the difficult parts of the alignment problem, e.g. once you have AGIs which are very situationally aware and have memorized all the rules and behaviors you want them to follow, and you are trying to figure out whether the reason they are acing all your tests is because they are virtuous, or because they are faking it, and if they are virtuous, whether their virtue is going to be robust to future distribution shifts.