Curated. Tackles thorny conceptual issues at the foundation of AI alignment while also revealing the weak spots of the abstractions used to do so.
I like the general strategy of trying to make progress on understanding the problem relying only on the concept of “basic agency” without having to work on the much harder problem of coming up with a useful formalism of a more full throated conception of agency, whether or not that turns out to be enough in the end.
The core point of the post: that certain kinds of goals only make sense at all given that there are certain kinds of patterns present in the environment, and that most of the problem of making sense of the alignment problem is identifying what those patterns are for the goal of “make aligned AGIs”, is plausible and worthy of discussion. I also appreciate that this post elucidates the (according to me) canon-around-these-parts general patterns that render the specific goal of aligning AGIs sensible (eg, compression based analyses of optimization) and presents them as such explicitly.
The introductory examples of patterns that must be present in the general environment for certain simpler goals to make sense—especially how the absence of the pattern makes the goal not make sense—are clear and evocative. I would not be surprised if they helped someone notice that there are some ways that the canon-around-these-parts hypothesized patterns which render “align AGIs” a sensible goal are importantly flawed.
Curated. Tackles thorny conceptual issues at the foundation of AI alignment while also revealing the weak spots of the abstractions used to do so.
I like the general strategy of trying to make progress on understanding the problem relying only on the concept of “basic agency” without having to work on the much harder problem of coming up with a useful formalism of a more full throated conception of agency, whether or not that turns out to be enough in the end.
The core point of the post: that certain kinds of goals only make sense at all given that there are certain kinds of patterns present in the environment, and that most of the problem of making sense of the alignment problem is identifying what those patterns are for the goal of “make aligned AGIs”, is plausible and worthy of discussion. I also appreciate that this post elucidates the (according to me) canon-around-these-parts general patterns that render the specific goal of aligning AGIs sensible (eg, compression based analyses of optimization) and presents them as such explicitly.
The introductory examples of patterns that must be present in the general environment for certain simpler goals to make sense—especially how the absence of the pattern makes the goal not make sense—are clear and evocative. I would not be surprised if they helped someone notice that there are some ways that the canon-around-these-parts hypothesized patterns which render “align AGIs” a sensible goal are importantly flawed.