Seems like if it thinks there’s a 5% chance humans explored X, but (if not, then) exploring X would force it to give up its current values
This is true for any given X, but if there are many things like X which are independently 5% likely to be explored, the model is in trouble.
Like the model only needs to be somewhat confident that its exploring everything the humans explored in the imitation data, but for any given case of choosing not to explore some behavior it needs to be very confident.
I don’t have the exact model very precisely worked out in my head and I might be explaining this in a somewhat confusing way, sorry.
This is true for any given X, but if there are many things like X which are independently 5% likely to be explored, the model is in trouble.
Like the model only needs to be somewhat confident that its exploring everything the humans explored in the imitation data, but for any given case of choosing not to explore some behavior it needs to be very confident.
I don’t have the exact model very precisely worked out in my head and I might be explaining this in a somewhat confusing way, sorry.