Distillation: We train an ML agent to implement a function from questions to
answers based on demonstrations (or incentives) provided by a large tree of
experts […]. The trained agent […] only replicates the tree’s input-output
behavior, not individual reasoning steps.
Why do we decompose in the first place? If the training data for the next agent
consists only of root questions and root answers, it doesn’t matter whether they
represent the tree’s input-output behaviour or the input-output behaviour of a
small group of experts who reason in the normal human high-context,
high-bandwidth way. The latter is certainly more efficient.
There seems to be a circular problem and I don’t understand how it is not
circular or where my understanding goes astray: We want to teach an ML agent aligned reasoning. This is difficult if
the training data consists of high-level questions and answers. So instead we
write down how we reason explicitly in small steps.
Some tasks are hard to write down in small steps. In these cases we write down a
naive decomposition that takes exponential time. A real-world agent can’t use
this to reason, because it would be too slow. To work around this we train a
higher-level agent on just the input-output behaviour of the slower agent. Now
the training data consists of high-level questions and answers. But this is what
we wanted to avoid, and therefore started writing down small steps.
Decomposition makes sense to me in the high-bandwidth setting where the task is too difficult for a human, so the human only divides it and combines the sub-results. I don’t see the point of decomposing a human-answerable question into even smaller low-bandwidth subquestions if we then throw away the tree and train an agent on the top-level question and answer.
Why do we decompose in the first place? If the training data for the next agent consists only of root questions and root answers, it doesn’t matter whether they represent the tree’s input-output behaviour or the input-output behaviour of a small group of experts who reason in the normal human high-context, high-bandwidth way. The latter is certainly more efficient.
There seems to be a circular problem and I don’t understand how it is not circular or where my understanding goes astray: We want to teach an ML agent aligned reasoning. This is difficult if the training data consists of high-level questions and answers. So instead we write down how we reason explicitly in small steps.
Some tasks are hard to write down in small steps. In these cases we write down a naive decomposition that takes exponential time. A real-world agent can’t use this to reason, because it would be too slow. To work around this we train a higher-level agent on just the input-output behaviour of the slower agent. Now the training data consists of high-level questions and answers. But this is what we wanted to avoid, and therefore started writing down small steps.
Decomposition makes sense to me in the high-bandwidth setting where the task is too difficult for a human, so the human only divides it and combines the sub-results. I don’t see the point of decomposing a human-answerable question into even smaller low-bandwidth subquestions if we then throw away the tree and train an agent on the top-level question and answer.
This first section of Ought: why it matters and ways to help answers the question. It’s also a good update on this post in general.