On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
Constrained-power-seeking still seems necessary to me. (unfortunately)
A few clarifications:
I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
My cognitive biases push me away from enacting power-seeking strategies.
Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
However, all else equal, I’d expect:
Spending a long time looking for x, makes x feel more important.
[Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence of inside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.
Do you see this as likely to have been avoidable? How?
I agree that it’s undesirable. Less clear to me that it’s an “own goal”.
Do you see other specific things we’re doing now (or that we may soon do) that seem likely to be future-own-goals?
[all of the below is “this is how it appears to my non-expert eyes”; I’ve never studied such dynamics, so perhaps I’m missing important factors]
I expect that, even early on, e/acc actively looked for sources of long-term disagreement with AI safety advocates, so it doesn’t seem likely to me that [AI safety people don’t emphasize this so much] would have much of an impact.
I expect that anything less than a position of [open-source will be fine forever] would have had much the same impact—though perhaps a little slower. (granted, there’s potential for hindsight bias here, so I shouldn’t say “I’m confident that this was inevitable”, but it’s not at all clear to me that it wasn’t highly likely)
It’s also not clear to me that any narrow definition of [AI safety community] was in a position to prevent some claims that open-source will be unacceptably dangerous at some point. E.g. IIRC Geoffrey Hinton rhetorically compared it to giving everyone nukes quite a while ago.
Reducing focus on [desirable, but controversial, short-term wins] seems important to consider where non-adversarial groups are concerned. It’s less clear that it helps against (proto-)adversarial groups—unless you’re proposing some kind of widespread, strict message discipline (I assume that you’re not).
[EDIT for useful replies to this, see Richard’s replies to Akash above]