This article thinks about what “general purpose search is” and why to expect it in advanced machine learning systems.
In general, we expect gradient descent to find “simple solutions” with lots of varying parameters (since they take a larger part in solution space) and “general solutions” that are helpful broadly (since we will put the system in diverse environments). Thus, we do expect search processes to emerge.
However, babble and prune will likely not be the resulting process: it’s not compute and memory efficient enough. Instead, John imagines a search process that starts with a constraint/problem and iteratively produces broad strokes of solutions that form new constraints of subproblems, until the problem is solved. If this is roughly correct, it will also mean that the search process is retargetable.
This leaves open how the broad strokes of solutions to constraints are found, which John expects requires heuristics that will often either output a solution itself, or a different problem whose solution is easier to generate, instead of babbling and then pruning. Some heuristics:
Relaxed problem: only consider time-constraint, or only consider immediately reducing Euclidean distance.
The specific relaxed problems are “heuristics”. The procedure to relax the problem is a “heuristic generator”.
One could consider this a “meta-heuristic”. However, the type-signature of “heuristic” is “problem in, solution out” or “problem in, other problem out”, and the type-signature of the meta heuristic is “problem in, heuristic out”, so these are different.
Finally, John gestures at the observation that heuristics seem to be environment-dependent but not goal-dependent, at least for similar types of goals (e.g. for the type “reach X city” or “do X thing this week”). This makes them more generalizable.
Other Thoughts
Don’t chess players sometimes do babble and prune? They might look at the board and literally “babble” possible moves, evaluate them, and search further in the best of those.
An alternative to that process would be to think “I want to capture the queen, how do I do this?” and then to explicitly think about moves that achieve that “constraint”. The original constraint/problem is just “win this game”, of which “capture the queen” is already a subconstraint/problem.
Still, I do remember Magnus Carlsen saying in an interview that he actually does do relatively exhaustive search in some chess situations. So it seems to at least be some search process of many he applies. But I also remember him saying that this is effortful.
The description of John leaves open the process with which the solutions to constraints are found. Doesn’t that process usually involve babbling?
In the case of finding stores, we may say “there is no babbling, the computer program just shows me the open stores that satisfy the constraint.”
But doesn’t the computer internally need to babble? I.e., to go through a database of all the options to find the ones satisfying the constraint!
In general, I would say babbling is required unless a solution to the constraint can already be retrieved in a somewhat cached form.
I’m not sure if “relax the problem” is a clear instruction. I feel like you already need to have something like a “natural abstraction of problems” in your head in order to be able to do that. This doesn’t really contradict what John is saying, but it highlights that there is some hidden complexity in this.
When I try to search for terms that I find on here, like “finite factored set” or “babble and prune” and many others, that I can’t find anywhere else except on here or other EA platforms. It always makes me wonder “Are we meming?” It seems like the meme culture is deep here. I think that is also what makes it hard for new users to get accustomed to. They have to read up on so much prerequisite materials in order to even participate in a conversation.
The description of John leaves open the process with which the solutions to constraints are found. Doesn’t that process usually involve babbling?
The important part here is that babble scales extremely poorly with dimensionality of the problem (or, more precisely, the fraction of problem-space which is filled with solutions). So babble is fine once we’ve reduced to a low-dimensional subproblem; most of the algorithmic work is in reducing the big problem to a bunch of low-dimensional subproblems.
Summary
This article thinks about what “general purpose search is” and why to expect it in advanced machine learning systems.
In general, we expect gradient descent to find “simple solutions” with lots of varying parameters (since they take a larger part in solution space) and “general solutions” that are helpful broadly (since we will put the system in diverse environments). Thus, we do expect search processes to emerge.
However, babble and prune will likely not be the resulting process: it’s not compute and memory efficient enough. Instead, John imagines a search process that starts with a constraint/problem and iteratively produces broad strokes of solutions that form new constraints of subproblems, until the problem is solved. If this is roughly correct, it will also mean that the search process is retargetable.
This leaves open how the broad strokes of solutions to constraints are found, which John expects requires heuristics that will often either output a solution itself, or a different problem whose solution is easier to generate, instead of babbling and then pruning. Some heuristics:
Relaxed problem: only consider time-constraint, or only consider immediately reducing Euclidean distance.
The specific relaxed problems are “heuristics”. The procedure to relax the problem is a “heuristic generator”.
One could consider this a “meta-heuristic”. However, the type-signature of “heuristic” is “problem in, solution out” or “problem in, other problem out”, and the type-signature of the meta heuristic is “problem in, heuristic out”, so these are different.
Finally, John gestures at the observation that heuristics seem to be environment-dependent but not goal-dependent, at least for similar types of goals (e.g. for the type “reach X city” or “do X thing this week”). This makes them more generalizable.
Other Thoughts
Don’t chess players sometimes do babble and prune? They might look at the board and literally “babble” possible moves, evaluate them, and search further in the best of those.
An alternative to that process would be to think “I want to capture the queen, how do I do this?” and then to explicitly think about moves that achieve that “constraint”. The original constraint/problem is just “win this game”, of which “capture the queen” is already a subconstraint/problem.
Still, I do remember Magnus Carlsen saying in an interview that he actually does do relatively exhaustive search in some chess situations. So it seems to at least be some search process of many he applies. But I also remember him saying that this is effortful.
The description of John leaves open the process with which the solutions to constraints are found. Doesn’t that process usually involve babbling?
In the case of finding stores, we may say “there is no babbling, the computer program just shows me the open stores that satisfy the constraint.”
But doesn’t the computer internally need to babble? I.e., to go through a database of all the options to find the ones satisfying the constraint!
In general, I would say babbling is required unless a solution to the constraint can already be retrieved in a somewhat cached form.
I’m not sure if “relax the problem” is a clear instruction. I feel like you already need to have something like a “natural abstraction of problems” in your head in order to be able to do that. This doesn’t really contradict what John is saying, but it highlights that there is some hidden complexity in this.
When I try to search for terms that I find on here, like “finite factored set” or “babble and prune” and many others, that I can’t find anywhere else except on here or other EA platforms. It always makes me wonder “Are we meming?” It seems like the meme culture is deep here. I think that is also what makes it hard for new users to get accustomed to. They have to read up on so much prerequisite materials in order to even participate in a conversation.
The important part here is that babble scales extremely poorly with dimensionality of the problem (or, more precisely, the fraction of problem-space which is filled with solutions). So babble is fine once we’ve reduced to a low-dimensional subproblem; most of the algorithmic work is in reducing the big problem to a bunch of low-dimensional subproblems.