Consider the computational difficulty of intrinsic vs. extrinsic alignment for a chess-playing AI.
Suppose you want the AI to walk its king to the center of the board before winning. With intrinsic alignment, this is a little tricky to encode but not too hard. With extrinsic alignment, this requires vastly outsmarting the chess-playing AI so that you can make it dance to your tune—maybe humans could do it to a 500-elo chess bot, but past 800 elo I think I’d only be able to solve the problem by building second chess engine that was intrinsically aligned to extrinsically align the first one.
Consider the computational difficulty of intrinsic vs. extrinsic alignment for a chess-playing AI.
Suppose you want the AI to walk its king to the center of the board before winning. With intrinsic alignment, this is a little tricky to encode but not too hard. With extrinsic alignment, this requires vastly outsmarting the chess-playing AI so that you can make it dance to your tune—maybe humans could do it to a 500-elo chess bot, but past 800 elo I think I’d only be able to solve the problem by building second chess engine that was intrinsically aligned to extrinsically align the first one.