I wonder: maybe it’d make sense to have some sort of watchdog organization officially tracking this sort of stuff. And maintaining a wall of shame-ish website. Maybe such a thing would make backpedalling on safety more costly for organizations, thus changing their calculus and causing them to spend incrementally more effort on safety.
I am tracking such things (although I don’t really care about Garrison’s critique in that particular post). I don’t have an up-to-date webpage on this at the moment but this is relevant. I am not doing the job of loudly proclaiming things and thus incentivizing compliance. I would be interested in advising someone-who-wants-to-do-that on what to loudly proclaim, or maybe hiring someone.
A “wall of shame” is somewhat rough because it disincentivizes making (nontrivial, falsifiable) commitments, or because a company that makes various real commitments and breaks some of them tends to be better than a company that never makes commitments. So communicating requires some nuance (but you could do it well).
Also I feel more concern like companies’ safety policies are super weak/vague/opaque so they’ll be able to say they’re complying while being unsafe than like companies will clearly violate their commitments. So I think tracking compliance is overrated relative to how on track the company is for safety.
it disincentivizes making (nontrivial, falsifiable) commitments
I wonder if this could be fixed e.g. by making a table where columns are companies, rows are things that should be done, and the cells are e.g. black for “didn’t even say that would do that” and orange for “they said they would do that” and green for “they actually do that”? Or something similar that would by design show “made a promise” as an improvement over “didn’t even make a promise”.
Yeah that’s progress. But then you realize that ~all commitments are too weak/vague (not to mention non-standardized) and you notice that words-about-the-future are cheap and you realize you should focus on stuff other than commitments.
I just saw (but didn’t read) the post Anthropic is Quietly Backpedalling on its Safety Commitments. I’ve seen similar posts before.
I wonder: maybe it’d make sense to have some sort of watchdog organization officially tracking this sort of stuff. And maintaining a wall of shame-ish website. Maybe such a thing would make backpedalling on safety more costly for organizations, thus changing their calculus and causing them to spend incrementally more effort on safety.
Insofar as he’s not already doing it (I don’t know) it seems like a natural job for @Zach Stein-Perlman
I am tracking such things (although I don’t really care about Garrison’s critique in that particular post). I don’t have an up-to-date webpage on this at the moment but this is relevant. I am not doing the job of loudly proclaiming things and thus incentivizing compliance. I would be interested in advising someone-who-wants-to-do-that on what to loudly proclaim, or maybe hiring someone.
A “wall of shame” is somewhat rough because it disincentivizes making (nontrivial, falsifiable) commitments, or because a company that makes various real commitments and breaks some of them tends to be better than a company that never makes commitments. So communicating requires some nuance (but you could do it well).
Also I feel more concern like companies’ safety policies are super weak/vague/opaque so they’ll be able to say they’re complying while being unsafe than like companies will clearly violate their commitments. So I think tracking compliance is overrated relative to how on track the company is for safety.
I wonder if this could be fixed e.g. by making a table where columns are companies, rows are things that should be done, and the cells are e.g. black for “didn’t even say that would do that” and orange for “they said they would do that” and green for “they actually do that”? Or something similar that would by design show “made a promise” as an improvement over “didn’t even make a promise”.
Yeah that’s progress. But then you realize that ~all commitments are too weak/vague (not to mention non-standardized) and you notice that words-about-the-future are cheap and you realize you should focus on stuff other than commitments.
Ah, it looks like Zach is working on https://ailabwatch.org/, which is pretty much what I was envisioning. Very cool.