As a kind of baseline, possibly every new model release you could ask an agent to “design, execute, and writeup a novel AI safety experiment, with the intention that it is publishable work for LessWrong as a first time poster”. That gives you some baseline as to how much effort people have/have not put in?
As a kind of baseline, possibly every new model release you could ask an agent to “design, execute, and writeup a novel AI safety experiment, with the intention that it is publishable work for LessWrong as a first time poster”. That gives you some baseline as to how much effort people have/have not put in?