I’m worried that whitelisting presents generalization challenges
I think you might have to bite the paintbucket in this case. Note that in a latent space formulation, having one tile-painting-transition whitelisted might suggest that other painting applications would have lower (but not 0) cost. In general, I agree—I don’t see how we could expect reasonable generalization along those lines because of the traffic light issue. It isn’t clear how big of a problem this is, though.
The whitelist is only closed under transitivity if you assume that the agent is capable of taking all transitions, and you aren’t worried about cost. If you have a → b and b → c whitelisted, then the agent can only get from a to c if it can change a to c going through intermediate state b, which may be much harder than going directly from a to c.
That’s correct—if a→c isn’t on the whitelist, it might de facto incur additional costs (whether in time or resources). I suppose I was pointing to the idea that our theoretical preferences should be closed under transitivity—if we accept a→b,b→c, we should not reject a→c happening over time.
You could just define the whitelist to be transitively closed, since it’s not hard to compute the transitive closure of a directed graph.
Good point! Does get trickier in latent space, though.
I think you might have to bite the paintbucket in this case. Note that in a latent space formulation, having one tile-painting-transition whitelisted might suggest that other painting applications would have lower (but not 0) cost. In general, I agree—I don’t see how we could expect reasonable generalization along those lines because of the traffic light issue. It isn’t clear how big of a problem this is, though.
That’s correct—if a→c isn’t on the whitelist, it might de facto incur additional costs (whether in time or resources). I suppose I was pointing to the idea that our theoretical preferences should be closed under transitivity—if we accept a→b,b→c, we should not reject a→c happening over time.
Good point! Does get trickier in latent space, though.