Paul creates a sub problem of alignment which is “alignment with low stakes.” Basically, this problem has one relaxation from the full problem: We never have to care about single decisions, or more formally traps cannot happen in a small set of actions.
Another way to say it is we temporarily limit distributional shift to safe bounds.
I like this relaxation of the problem, because it gets at a realistic outcome we may be able to reach, and in particular it let’s people work on it without much context.
However, the fact inner alignment doesn’t need to be solved may be a problem depending on your beliefs about outer vs inner alignment.
Paul creates a sub problem of alignment which is “alignment with low stakes.” Basically, this problem has one relaxation from the full problem: We never have to care about single decisions, or more formally traps cannot happen in a small set of actions.
Another way to say it is we temporarily limit distributional shift to safe bounds.
I like this relaxation of the problem, because it gets at a realistic outcome we may be able to reach, and in particular it let’s people work on it without much context.
However, the fact inner alignment doesn’t need to be solved may be a problem depending on your beliefs about outer vs inner alignment.
I’d give it a +3 in my opinion.