Machine Learning Projects on IDA
We wrote a 20-page document that explains IDA and outlines potential Machine Learning projects about IDA. This post gives an overview of the document.
What is IDA?
Iterated Distillation and Amplification (IDA) is a method for training ML systems to solve challenging tasks. It was introduced by Paul Christiano. IDA is intended for tasks where:
The goal is to outperform humans at the task or to solve instances that are too hard for humans.
It is not feasible to provide demonstrations or reward signals sufficient for super-human performance at the task
Humans have a high-level understanding of how to approach the task and can reliably solve easy instances.
The idea behind IDA is to bootstrap using an approach similar to AlphaZero, but with a learned model of steps of human reasoning instead of the fixed game simulator.
Our document provides a self-contained technical description of IDA. For broader discussion of IDA and its relevance to value alignment, see Ought’s presentation, Christiano’s blogpost, and the Debate paper. There is also a technical ML paper applying IDA to algorithmic problems (e.g. shortest path in a graph).
ML Projects on IDA
Our document outlines three Machine Learning projects on IDA. Our goal in outlining these projects is to generate discussion and encourage research on IDA. We are not (as of June 2019) working on these projects, but we are interested in collaboration. The project descriptions are “high-level” and leave many choices undetermined. If you took on a project, part of the work would be refining the project and fixing a concrete objective, dataset and model.
Project 1: Amplifying Mathematical Reasoning
This project is about applying IDA to problems in mathematics. This would involve learning to solve math problems by breaking them down into easier sub-problems. The problems could be represented in a formal language (as in this paper) or in natural language. We discuss a recent dataset of high-school problems in natural language, which was introduced in this paper. Here are some examples from the dataset:
Question: Let u(n) = -n^3 - n^2. Let e(c) = −2*c^3 + c. Let f(j) = −118*e(j) + 54*u(j). What is the derivative of f(a)?
Answer: 546*a^2 − 108*a − 118
Question: Three letters picked without replacement from qqqkkklkqkkk. Give probability of sequence qql.
The paper showed impressive results on the dataset for a Transformer model trained by supervised learning (sequence-to-sequence). This suggests that a similar model could do well at learning to solve these problems by decomposition.
Project 2: IDA for Neural Program Interpretation
There’s a research program in Machine Learning on “Neural Program Interpretation” (NPI). Work on NPI focuses on learning to reproduce the behavior of computer programs. One possible approach is to train end-to-end on input-output behavior. However in NPI, a model is trained to mimic the program’s internal behavior, including all the low-level operations and the high-level procedures which invoke them.
NPI has some similar motivations to IDA. This project applies IDA to the kinds of tasks explored in NPI and compares IDA to existing approaches. Tasks could include standard algorithms (e.g. sorting), algorithms that operate with databases, and algorithms that operate on human-readable inputs (e.g. text, images).
Project 3: Adaptive Computation
The idea of “adaptive computation” is to vary the amount of computation you perform for different inputs. You want to apply more computation to inputs that are hard but solvable.
Adaptive computation seems important for the kinds of problems IDA is intended to solve, including some of the problems in Projects 1 and 2. This project would investigate different approaches to adaptive computation for IDA. The basic idea is to decide whether to rely only on the distilled model (which is fast but approximate) or to additionally use amplification (which is more accurate but slower). This decision could be based on a calibrated model or based on a learned policy for choosing whether to use amplification.