I endorse the spirit of this distillation a lot more than the original post, though I note that Mikhail doesn’t seem to agree.
I don’t think those two worlds are the most helpful ones to consider, though. I think it’s extremely implausible[1] that Anthropic leadership are acting in some coordinated fashion to deceive employees about their pursuit of the mission while actually profit-maxxing or something.
I think the much more plausible world to watch out for is something like:
Anthropic leadership is reliably trying to pursue the mission and is broadly acting with good intentions, but some of Anthropic’s actions are bad for that mission for reasons like:
incorrect or biased beliefs by Anthropic leadership about what would be best for that mission
selective or biased reporting of things in self-serving ways by leadership in ordinary human ways of the sort that don’t feel internally like deception but can be easy to slip into under lots of social pressure
actions on the part of less-mission-aligned employees without sufficient oversight at higher levels of the org
decisionmakers who just haven’t really stopped to think about the consequences of their actions on some aspect of the mission, even though in theory they might realize this was bad
failures of competence in pursuing a good goal
random balls getting dropped for complicated big-organization reasons that aren’t any one person’s fault in a crisp way
Of course this is a spectrum, and this kind of thing will obviously be the case to some nonzero degree; the relevant questions are things like:
Which actors can I trust that if they’re owning some project, that project will be executed competently and with attention paid to the mission-relevant components that I care about?
What persistent biases do I think are present in this part of the org, and how could I improve that state of affairs?
Is the degree of failure in this regard large enough that my contributions to Anthropic-as-a-whole are net negative for the world?
What balls appear to be getting dropped, that I might be able to pick up?
What internal cultural changes would move decisionmaking in ways that would more reliably pursue the good?
I’d be excited for more external Anthropic criticism to pitch answers to questions like these.
I won’t go into all the reasons I think this, but just to name one, the whole org is peppered with the kinds of people who have quit OpenAI in protest over such actions, that’s such a rough environment to maintain this conspiracy in!
I agree that these are not the two worlds which would be helpful to consider, and your list of reasons are closer to my model than Lucie’s representation of my model.
(I do hope that my post somewhat decreases trust in Jack Clark and Dario Amodei and somewhat increases the incentives for the kind of governance that would not be dependent on trustworthy leadership to work.)
I endorse the spirit of this distillation a lot more than the original post, though I note that Mikhail doesn’t seem to agree.
I don’t think those two worlds are the most helpful ones to consider, though. I think it’s extremely implausible[1] that Anthropic leadership are acting in some coordinated fashion to deceive employees about their pursuit of the mission while actually profit-maxxing or something.
I think the much more plausible world to watch out for is something like:
Anthropic leadership is reliably trying to pursue the mission and is broadly acting with good intentions, but some of Anthropic’s actions are bad for that mission for reasons like:
incorrect or biased beliefs by Anthropic leadership about what would be best for that mission
selective or biased reporting of things in self-serving ways by leadership in ordinary human ways of the sort that don’t feel internally like deception but can be easy to slip into under lots of social pressure
actions on the part of less-mission-aligned employees without sufficient oversight at higher levels of the org
decisionmakers who just haven’t really stopped to think about the consequences of their actions on some aspect of the mission, even though in theory they might realize this was bad
failures of competence in pursuing a good goal
random balls getting dropped for complicated big-organization reasons that aren’t any one person’s fault in a crisp way
Of course this is a spectrum, and this kind of thing will obviously be the case to some nonzero degree; the relevant questions are things like:
Which actors can I trust that if they’re owning some project, that project will be executed competently and with attention paid to the mission-relevant components that I care about?
What persistent biases do I think are present in this part of the org, and how could I improve that state of affairs?
Is the degree of failure in this regard large enough that my contributions to Anthropic-as-a-whole are net negative for the world?
What balls appear to be getting dropped, that I might be able to pick up?
What internal cultural changes would move decisionmaking in ways that would more reliably pursue the good?
I’d be excited for more external Anthropic criticism to pitch answers to questions like these.
I won’t go into all the reasons I think this, but just to name one, the whole org is peppered with the kinds of people who have quit OpenAI in protest over such actions, that’s such a rough environment to maintain this conspiracy in!
I agree that these are not the two worlds which would be helpful to consider, and your list of reasons are closer to my model than Lucie’s representation of my model.
(I do hope that my post somewhat decreases trust in Jack Clark and Dario Amodei and somewhat increases the incentives for the kind of governance that would not be dependent on trustworthy leadership to work.)