1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).
1. I don’t think that every heuristic A2 could use to solve problems of depth d0 needs to be applicable to performing the search of depth d1 - we only need enough heuristics to be useable to be able to keep increasing the search depth at each amplification round in an efficient manner. It’s possible that some of the value of heuristics like “solution is likely to be an output of algorithm G” could be (imperfectly) captured through some small universal set of heuristics that we can specify how to learn and exploit. (I think that variations on “How likely is the search on [partial solution] to produce an answer?” might get us pretty far).
The AlphaGo analogy is that the original supervised move prediction algorithm didn’t necessarily learn every heuristic that the experts used, but just learned enough to be able to efficiently guide the MCTS to better performance.
(Though I do think that imperfectly learning heuristics might cause alignment problems without a solution to the aligned search problem).
2. This isn’t a problem if once the agent A2 can run algorithm G on t for problems of depth d0, it can directly generalize to applying G to problems of depth d1. Simple Deep RL methods aren’t good at this kind of tasks, but things like the Neural Turing Machine are trying to do better on this sort of tasks. So the ability to learn efficient exponential search could be limited by the underlying agent capability; for some capability range, a problem could be directly solved by an unaligned agent, but couldn’t be solved for an aligned agent. This isn’t a problem if we can surpass that level of capability.
I’m not sure that these considerations fix the problem entirely, or whether Paul would take a different approach.
It also might be worth coming up with a concrete example where some heuristics are not straightforward to generalize from smaller to larger problems, and it seems like this will prevent efficiently learning to solve large problems. The problem, however, would need to be something that humans can solve (ie. finding a string that hashed to a particular value using a cryptographic hash function would be hard to generalize any heuristics from, but I don’t think humans could do it either so it’s outside of scope).