Garrett Baker comments on D0TheMath’s Shortform

Garrett Baker 30 May 2023 21:14 UTC
4 points
I expect that advanced AI systems will do in-context optimization, and this optimization may very well be via gradient descent or gradient descent derived methods. Applied recursively, this seems worrying.

Let the outer objective be the loss function implemented by the ML practitioner, and the outer optimizer be gradient descent implemented by the ML practitioner. Then let the inner $^{1}$ -objective be the objective used by the trained model for the in-context gradient descent process, and the inner $^{1}$ -optimizer be the in-context gradient descent process. Then it seems plausible the inner $^{1}$ -optimizer will itself instantiate an inner objective and optimizer, call these inner $^{2}$ -objectives, and -optimizers. And again an inner $^{3}$ -objective and -optimizer may be made, and so on.

Thus, another risk model in value-instability: Recursive inner-alignment. Though we may solve inner $^{1}$ -alignment, inner $^{2}$ -alignment may not be solved, nor inner $^{n}$ -alignment for any $n > 1$ .