That is, I can easily see how factored cognition allows you to stick to cognitive strategies that definitely solve a problem in a safe way, but don’t see how it does that and allows you to develop new cognitive strategies to solve a problem that doesn’t result in an opening for inner optimizers—not within units, but within assemblages of units.
Do you have some intuition for how inner optimizers would arise within assemblages of units, without being initiated by some unit higher in the hierarchy? Or is that what you are pointing at?
When I imagine them they are being initiated by some unit higher in the hierarchy. Basically, you could imagine having a tree of humans that is implementing a particular search process, or a different tree of humans implementing a search over search processes, with the second perhaps being more capable (because it can improve itself) but also perhaps leading to inner alignment problems.
Do you have some intuition for how inner optimizers would arise within assemblages of units, without being initiated by some unit higher in the hierarchy? Or is that what you are pointing at?
When I imagine them they are being initiated by some unit higher in the hierarchy. Basically, you could imagine having a tree of humans that is implementing a particular search process, or a different tree of humans implementing a search over search processes, with the second perhaps being more capable (because it can improve itself) but also perhaps leading to inner alignment problems.