My forefrontest thought as I was finishing this essay was “Applying this concept to AI risk is left as an exercise for the reader.”.
Then I thought that AI risk, if anything, is characterized by kinda the opposite dynamic: lots of groups with different risk models, not that rarely explicitly criticizing the strategy/approach of the others as net-negative or implicitly complacent with the baddies, finding it hard to cooperate despite what locally seems like convergent local subgoals. (To be clear: I’m not claiming that everybody’s take/perspective on this is valid or that everybody in this field should cooperate with everybody else or whatever.)
Then I thought that, actually, even within what seems like somewhat coherent factions, we would probably see some tails coming apart once their goal (AI moratorium, PoC aligned AGI, existential security, exiting the acute risk period) is achieved.
And then there were conversations where people I viewed as ~allied turned out to bite bullets that I considered (and still consider) equivalent to moral atrocities.
I may want to think more about this but ATM it seems to me like AI risk as a field (or loose cluster of groups) is failing both on cooperating to achieve locally cooperation-worthy convergent subgoals and also failing on seeing past the moral homophones.
(When I say “failing”, I’m inclined to ask myself what standard I should apply but reality doesn’t grade on a curve and stakes are huge.)
My forefrontest thought as I was finishing this essay was “Applying this concept to AI risk is left as an exercise for the reader.”.
Then I thought that AI risk, if anything, is characterized by kinda the opposite dynamic: lots of groups with different risk models, not that rarely explicitly criticizing the strategy/approach of the others as net-negative or implicitly complacent with the baddies, finding it hard to cooperate despite what locally seems like convergent local subgoals. (To be clear: I’m not claiming that everybody’s take/perspective on this is valid or that everybody in this field should cooperate with everybody else or whatever.)
Then I thought that, actually, even within what seems like somewhat coherent factions, we would probably see some tails coming apart once their goal (AI moratorium, PoC aligned AGI, existential security, exiting the acute risk period) is achieved.
And then I thought, well...
GDM, OpenAI, Anthropic, …
Epoch and Mechanize
… there’s probably more examples in the past
And then there were conversations where people I viewed as ~allied turned out to bite bullets that I considered (and still consider) equivalent to moral atrocities.
I may want to think more about this but ATM it seems to me like AI risk as a field (or loose cluster of groups) is failing both on cooperating to achieve locally cooperation-worthy convergent subgoals and also failing on seeing past the moral homophones.
(When I say “failing”, I’m inclined to ask myself what standard I should apply but reality doesn’t grade on a curve and stakes are huge.)
---
Anyway, thanks for the post and the concept!