I really appreciate this write up. I felt sad while reading it that I have a very hard time imagining an AI lab yielding to another leader it considers to be irresponsible—or maybe not even yielding to one it considers to be responsible. (I am not that familiar with the inner workings at Anthropic though, and they are probably top of my list on labs that might yield in those scenarios, or might not race desperately if in a close one.)
One reason for not yielding is that it’s probably hard for one lab to definitively tell that another lab is very far ahead of them, since we should expect some important capability info to remain private.
It seems to me then that ways of labs credibly demonstrating leads, without leaking info that allows others to catch up, might be a useful thing to exist—perhaps paired with enforceable conditional commitments to yield if certain conditions are demonstrated.
I really appreciate this write up. I felt sad while reading it that I have a very hard time imagining an AI lab yielding to another leader it considers to be irresponsible—or maybe not even yielding to one it considers to be responsible. (I am not that familiar with the inner workings at Anthropic though, and they are probably top of my list on labs that might yield in those scenarios, or might not race desperately if in a close one.)
One reason for not yielding is that it’s probably hard for one lab to definitively tell that another lab is very far ahead of them, since we should expect some important capability info to remain private.
It seems to me then that ways of labs credibly demonstrating leads, without leaking info that allows others to catch up, might be a useful thing to exist—perhaps paired with enforceable conditional commitments to yield if certain conditions are demonstrated.