I’m fanatically in favor of creating new ways to test (& thereby develop) rationality in general and scientific capabilities in particular. However, any such resources would necessarily be dual-use, providing AI developers evals (or eval paradigms) which could help accelerate AI development along the axes on which it’s currently most lacking. This seems like an obviously insane thing to worry about but I can’t figure out why; soliciting other opinions.
I’m fanatically in favor of creating new ways to test (& thereby develop) rationality in general and scientific capabilities in particular. However, any such resources would necessarily be dual-use, providing AI developers evals (or eval paradigms) which could help accelerate AI development along the axes on which it’s currently most lacking. This seems like an obviously insane thing to worry about but I can’t figure out why; soliciting other opinions.