there’s an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there—but the catch is, you only come out net positive if the world chooses to act on this information
there’s an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there—but the catch is, you only come out net positive if the world chooses to act on this information