How I think about METR’s theory of change: General principles: - avoid world being taken by surprise by AI catastrophe—improve knowledge / understanding / science of assessing risk from AI systems— independent/trustworthy/truthseeking/minimally-conflicted expert org existing is good—can advise world, be a counterbalance to AI companies; a nonprofit has slightly different affordances than govt here.
Strategy: - try to continually answer question of “how dangerous are current / near-future AI systems”, and do research to be able to keep answering that question as well as possible - be boring and neutral and straightforward, aim to explain not persuade
Some specific impact stories: - at some point in future political willingness may be much higher, help channel that into more informed and helpful response - independent technical review and redteaming of alignment + other mitigations, find issues companies have missed - increase likelihood that misalignment incidents or other ‘warning shots’ are shared/publicized and analyzed well
I think that broad ToC has been pretty constant throughout METR’s existence, but my memory is not great so I wouldn’t be that surprised if I was framing it pretty differently in the past and e.g. emphasizing conditional commitments more highly.
How I think about METR’s theory of change:
General principles:
- avoid world being taken by surprise by AI catastrophe—improve knowledge / understanding / science of assessing risk from AI systems—
independent/trustworthy/truthseeking/minimally-conflicted expert org existing is good—can advise world, be a counterbalance to AI companies; a nonprofit has slightly different affordances than govt here.
Strategy:
- try to continually answer question of “how dangerous are current / near-future AI systems”, and do research to be able to keep answering that question as well as possible
- be boring and neutral and straightforward, aim to explain not persuade
Some specific impact stories:
- at some point in future political willingness may be much higher, help channel that into more informed and helpful response
- independent technical review and redteaming of alignment + other mitigations, find issues companies have missed
- increase likelihood that misalignment incidents or other ‘warning shots’ are shared/publicized and analyzed well
I think that broad ToC has been pretty constant throughout METR’s existence, but my memory is not great so I wouldn’t be that surprised if I was framing it pretty differently in the past and e.g. emphasizing conditional commitments more highly.