It seems like there’s room for the theory of logical-inductor-like agents with limited computational resources, and I’m not sure if this has already been figured out. The entire trick seems to be that when you try to build a logical inductor agent, it’s got some estimation process for math problems like “what does my model predict will happen?” and it’s got some search process to find good actions, and you don’t want the search process to be more powerful than the estimator because then it will find edge cases. In fact, you want them to be linked somehow, so that the search process is never in the position of taking advantage of the estimator’s mistakes—if you, a human, are making some plan and notice a blind spot in your predictions, you don’t “take advantage” of yourself, you do further estimating as part of the search process.
The hard part is formalizing this handwavy argument, and figuring out what other strong conditions need to be met to get nice guarantees like bounded regret.
It seems like there’s room for the theory of logical-inductor-like agents with limited computational resources, and I’m not sure if this has already been figured out. The entire trick seems to be that when you try to build a logical inductor agent, it’s got some estimation process for math problems like “what does my model predict will happen?” and it’s got some search process to find good actions, and you don’t want the search process to be more powerful than the estimator because then it will find edge cases. In fact, you want them to be linked somehow, so that the search process is never in the position of taking advantage of the estimator’s mistakes—if you, a human, are making some plan and notice a blind spot in your predictions, you don’t “take advantage” of yourself, you do further estimating as part of the search process.
The hard part is formalizing this handwavy argument, and figuring out what other strong conditions need to be met to get nice guarantees like bounded regret.