I really appreciated this comment for making the connection between this paper and IDA.
More explicitly, to the extent that you can think of the original large language model as simulating a human, there’s an analogy between:
asking the LLM to reason about its inputs and then training on the conclusion of the reasoning (what this paper does)
asking simulated humans to reason about a problem and then training a new model on their conclusions (the basic idea behind iterated distillation and amplification).
This is a also a great chance for IDA skeptics to try to empirically demonstrate issues, either by finding failures of capabilities to scale, or by demonstrating alignment failures in amplified systems.
I really appreciated this comment for making the connection between this paper and IDA.
More explicitly, to the extent that you can think of the original large language model as simulating a human, there’s an analogy between:
asking the LLM to reason about its inputs and then training on the conclusion of the reasoning (what this paper does)
asking simulated humans to reason about a problem and then training a new model on their conclusions (the basic idea behind iterated distillation and amplification).
This is a also a great chance for IDA skeptics to try to empirically demonstrate issues, either by finding failures of capabilities to scale, or by demonstrating alignment failures in amplified systems.