Meta-point: I think it would have been better if you had split the post into two parts: one for “Here is a structure of preferences which we would like to instill in our AI,” and the second for “Here is how we are going to do it in a prosaic alignment setting.” It would have reduced scary “50 min read” into not-so-scary chunks, and people would have been more engaged with the more narrow topics.
Object-level point: I don’t think that training for having stochastic choices amounts for what we need. Thompson sampling is stochastic and it is indeed not vNM-rational, but it doesn’t mean that it equals to incomplete preferences.
I think that stochastic choice does suffice for a lack of preference in the relevant sense. If the agent had a preference, it would reliably choose the option it preferred. And tabooing ‘preference’, I think stochastic choice between different-length trajectories makes it easier to train agents to satisfy Timestep Dominance, which is the property that keeps agents shutdownable. And that’s because Timestep Dominance follows from stochastic choice between different-length trajectories and a more general principle that we’ll train agents to satisfy, because it’s a prerequisite for minimally sensible action under uncertainty. I discuss this in a little more detail in section 18.
Meta-point: I think it would have been better if you had split the post into two parts: one for “Here is a structure of preferences which we would like to instill in our AI,” and the second for “Here is how we are going to do it in a prosaic alignment setting.” It would have reduced scary “50 min read” into not-so-scary chunks, and people would have been more engaged with the more narrow topics.
Object-level point: I don’t think that training for having stochastic choices amounts for what we need. Thompson sampling is stochastic and it is indeed not vNM-rational, but it doesn’t mean that it equals to incomplete preferences.
Yep, maybe that would’ve been a better idea!
I think that stochastic choice does suffice for a lack of preference in the relevant sense. If the agent had a preference, it would reliably choose the option it preferred. And tabooing ‘preference’, I think stochastic choice between different-length trajectories makes it easier to train agents to satisfy Timestep Dominance, which is the property that keeps agents shutdownable. And that’s because Timestep Dominance follows from stochastic choice between different-length trajectories and a more general principle that we’ll train agents to satisfy, because it’s a prerequisite for minimally sensible action under uncertainty. I discuss this in a little more detail in section 18.