Charlie Steiner answers Simple question about corrigibility and values in AI.

Charlie Steiner 22 Oct 2022 4:23 UTC
2 points
0
I mostly don’t think about corrigibility. But when I do, it’s generally to label departures from an AI having agenty structure. Some people like to think about corrigibility in terms of stimulus-response patterns, or behavioral guarantees, or as a grab-bag of doomed attempts to give orders to something smarter than you without understanding it. These are all fine too.
I definitely think about values more in terms of abstract states. By “abstract” I mean that states don’t have to be specific states of the universe’s quantum wavefunction, they can be anything that fills the role of “state” in a hierarchical set of models of the world.
It’s not that I’m hardcore committed to an AI never learning values that are about process. But I tend to think of even those in terms of state—as the AI having a model of itself and its own decision-making that it can control variables of like “how do I make decisions?” (Or even as vague as “Am I being good and just?”)
Basically this is because I think that some state-based preferences are really important, and once you have those, deontological rules that have no grounding in state whatsoever are unnatural.