In light of mythos release: If you are considering taking a job at a lab/going into an org/taking a fellowship role for the purposes of building evals, safety tools, mechinterp projects, control monitors, or similar: please consider what happens if you succeed. People are naturally sensitive to the consequences of failure, less so to the effects of success.
“Mr. Amodei/Hassabis/Altman, the results are in. The model is showing scheming propensities/backdooring behaviours/serious sandbagging/eval awareness!” (e.g. see section 6.2.1.2)
Will this actually stop a deployment in the end, or cause a pivot in strategy? Or will the alignment failures need to be so egregious that they can be spotted even without subtle mechinterp probes or activation oracles? Facebook had trust and safety teams, they spotted the facets of the recommendation systems that caused severe harm in the world. Yet the proposed mitigations were watered down, and most importantly the system that was the profit centre of the billion dollar public corporation was never turned off.
There are things people can do with their time besides “work at a lab”, “protest outside the lab’, and “bake cookies”. I think the ai world has not seriously tried to consider anything other than mad race or shutdown, or any way to use ai besides immediate attempts to build asi. Cf also my previous thoughts on trying to overcome molochian dynamics