testingthewaters comments on Raemon’s Shortform

testingthewaters 5 Mar 2026 10:38 UTC
2 points
0
As a kind of baseline, possibly every new model release you could ask an agent to “design, execute, and writeup a novel AI safety experiment, with the intention that it is publishable work for LessWrong as a first time poster”. That gives you some baseline as to how much effort people have/have not put in?