Looking for Recommendations RE UDT vs. bounded computation / meta-reasoning / opportunity cost?

IAFF-User-1118 Nov 2017 22:58 UTC

0 points

The main issue I have with UDT is that it neglects the meta-reasoning problem of: “how much should I think before I act?”

Is there anything I should read / know about WRT this?
What are people’s opinions on whether this is a serious issue, and how it could be resolved? What is the relation to logical updatelessness?

There’s generally an opportunity cost to deliberating.
Solving the UDT planning problem would take infinite compute, so it seems like we should be considering agents that can start acting without having solved this planning problem.

Maybe they should converge to doing what UDT would do. Alternatively, maybe it’s better to do empirical updates in this situation.

IAFF-User-1118 Nov 2017 22:58 UTC

0 points

1 comment1 min readLW link

abramdemski 11 Nov 2017 0:34 UTC
0 points
0
AF
At present, I think the main problem of logical updatelessness is something like: how can we make a principled trade-off between thinking longer to make a better decision, vs thinking less long so that we exert more logical control on the environment?

For example, in Agent Simulates Predictor, an agent who thinks for a short amount of time and then decides on a policy for how to respond to any conclusions which it comes to after thinking longer can decide “If I think longer, and see a proof that the predictor thinks I two-box, I can invalidate that proof by one-boxing. Adopting this policy makes the predictor less likely to find such a proof.” (I’m speculating; I haven’t actually written up a thing which does this, yet, but I think it would work.) An agent who thinks longer before making a decision can’t see this possibility because it has already proved that the predictor predicts two-boxing, so from the perspective of having thought longer, there doesn’t appear to be a way to invalidate the prediction—being predicted to two-box is just a fact, not a thing the agent has control over.

Similarly, in Prisoner’s Dilemma, an agent who hasn’t thought too long can adopt a strategy of first thinking longer and then doing whatever it predicts the other agent to do. This is a pretty good strategy, because it makes it so that the other agent’s best strategy is to cooperate. However, you have to think for long enough to find this particular strategy, but short enough that the hypotheticals which show that the strategy is a good idea aren’t closed off yet.

So, I think there is less conflict between UDT and bounded reasoning than you are implying. However, it’s far from clear how to negotiate the trade-offs sanely.

(However, in both cases, you still want to spend as long a time thinking as you can afford; it’s just that you want to make the policy decision, about how to use the conclusions of that thinking, as early as they can be made while remaining sensible.)

Looking for Recommendations RE UDT vs. bounded computation /​ meta-reasoning /​ opportunity cost?

Looking for Recommendations RE UDT vs. bounded computation / meta-reasoning / opportunity cost?