constructing good auditing test beds has been a huge pain
Yeah, I confirm this is the case. I estimate that the model organism from Auditing Language Models for Hidden Objectives took 8-12 FTE-months, which I think most people find surprisingly expensive. That said, 30% of this time probably went into implementing synthetic documenting finetuning and testing that it worked. I’d guess I could now make another model organism of similar quality in 1.5-3 FTE-months.
Yeah, I confirm this is the case. I estimate that the model organism from Auditing Language Models for Hidden Objectives took 8-12 FTE-months, which I think most people find surprisingly expensive. That said, 30% of this time probably went into implementing synthetic documenting finetuning and testing that it worked. I’d guess I could now make another model organism of similar quality in 1.5-3 FTE-months.