I continue to think it would be valuable for organizations pursuing aligned AI to have a written halt procedure with rules for how to invoke it—including provisions to address the fact that people at the organization are likely to be selected for optimism, and so forth.
Perhaps external activists could push for such measures: “If you’re not gonna pause now, can you at least give us a detailed specification of what circumstances would justify a pause?”
I suppose “Responsible Scaling Policies” are something like what I’m describing… how are those working out?
It seems there is some diffusion of responsibility going on here. It could help to concentrate responsibility in a particular procedure, or particular group of individuals (e.g. have an internal red team, who are paid to interview people anonymously, and construct the best possible case for a halt).
Based on what I’ve read about safety culture in high-performing organizations, the safety culture of top AI organizations, as described in this post, seems fairly terrible. Perhaps even scarier than the lack of safety culture is the apparent lack of 101-level reading/implementation of what effective safety culture looks like. My impression is that many of the topics discussed here map well to standard safety culture concepts like “normalization of deviance”. In general the post reads like a fairly typical sociological characterization of an organization that is on the verge of causing a catastrophic failure? Although admittedly I did my research into safety culture quite a while ago. Maybe it would be possible to get some reliability engineers to testify in front of Congress.
EDIT: On the other hand, the good news is: I expect there may be low-hanging fruit for improvement by simply hiring safety consultants who work with other high-stakes industries like aviation, nuclear, etc. and taking their recommendations seriously. A more pessimistic organization like MIRI could even offer to pay the fees for a mutually agreeable consultant. If AI companies are not willing to work with such a consultant, even when an outsider such as MIRI is paying the fees, that seems like an incredibly bad sign.
I continue to think it would be valuable for organizations pursuing aligned AI to have a written halt procedure with rules for how to invoke it—including provisions to address the fact that people at the organization are likely to be selected for optimism, and so forth.
Perhaps external activists could push for such measures: “If you’re not gonna pause now, can you at least give us a detailed specification of what circumstances would justify a pause?”
I suppose “Responsible Scaling Policies” are something like what I’m describing… how are those working out?
It seems there is some diffusion of responsibility going on here. It could help to concentrate responsibility in a particular procedure, or particular group of individuals (e.g. have an internal red team, who are paid to interview people anonymously, and construct the best possible case for a halt).
Based on what I’ve read about safety culture in high-performing organizations, the safety culture of top AI organizations, as described in this post, seems fairly terrible. Perhaps even scarier than the lack of safety culture is the apparent lack of 101-level reading/implementation of what effective safety culture looks like. My impression is that many of the topics discussed here map well to standard safety culture concepts like “normalization of deviance”. In general the post reads like a fairly typical sociological characterization of an organization that is on the verge of causing a catastrophic failure? Although admittedly I did my research into safety culture quite a while ago. Maybe it would be possible to get some reliability engineers to testify in front of Congress.
EDIT: On the other hand, the good news is: I expect there may be low-hanging fruit for improvement by simply hiring safety consultants who work with other high-stakes industries like aviation, nuclear, etc. and taking their recommendations seriously. A more pessimistic organization like MIRI could even offer to pay the fees for a mutually agreeable consultant. If AI companies are not willing to work with such a consultant, even when an outsider such as MIRI is paying the fees, that seems like an incredibly bad sign.