johnswentworth comments on The Plan − 2022 Update

johnswentworth 2 Dec 2022 18:32 UTC
LW: 38 AF: 13
9
AF
Still on the “figure out agency and train up an aligned AGI unilaterally” path?
“Train up an AGI unilaterally” doesn’t quite carve my plans at the joints.
One of the most common ways I see people fail to have any effect at all is to think in terms of “we”. They come up with plans which “we” could follow, for some “we” which is not in fact going to follow that plan. And then they take political-flavored actions which symbolically promote the plan, but are not in fact going to result in “we” implementing the plan. (And also, usually, the “we” in question is too dysfunctional as a group to implement the plan even if all the individuals wanted to, because that is how approximately 100% of organizations of more than 10 people operate.) In cognitive terms, the plan is pretending that lots of other peoples’ actions are choosable/controllable, when in fact those other peoples’ actions are not choosable/controllable, at least relative to the planner’s actual capabilities.
The simplest and most robust counter to this failure mode is to always make unilateral plans.
But to counter the failure mode, plans don’t need to be completely unilateral. They can involve other people doing things which those other people will actually predictably do. So, for instance, maybe I’ll write a paper about natural abstractions in hopes of nerd-sniping some complex systems theorists to further develop the theory. That’s fine; the actions which I need to counterfact over in order for that plan to work are actions which I can in fact take unilaterally (i.e. write a paper). Other than that, I’m just relying on other people acting in ways in which they’ll predictably act anyway.
Point is: in order for a plan to be a “real plan” (as opposed to e.g. a fabricated option, or a de-facto applause light), all of the actions which the plan treats as “under the planner’s control” must be actions which can be taken unilaterally. Any non-unilateral actions need to be things which we actually expect people to do by default, not things we wish they would do.
Coming back to the question: my plans certainly do not live in some childrens’ fantasy world where one or more major AI labs magically become the least-dysfunctional multiple-hundred-person organizations on the planet, and then we all build an aligned AGI via the magic of Friendship and Cooperation. The realistic assumption is that large organizations are mostly carried wherever the memetic waves drift. Now, the memetic waves may drift in a good direction—if e.g. the field of alignment does indeed converge to a paradigm around decoding the internal language of nets and expressing our targets in that language, then there’s a strong chance the major labs follow that tide, and do a lot of useful work. And I do unilaterally have nonzero ability to steer that memetic drift—for instance, by creating public knowledge of various useful lines of alignment research converging, or by training lots of competent people.
That’s the sort of non-unilaterality which I’m fine having in my plans: relying on other people to behave in realistic ways, conditional on me doing things which I can actually unilaterally do.
What links here?
- The Plan − 2024 Update by johnswentworth (31 Dec 2024 13:29 UTC; 123 points)
- Thane Ruthenis's comment on AI #27: Portents of Gemini by Zvi (31 Aug 2023 16:02 UTC; 13 points)