Sam Marks comments on Paper: Large Language Models Can Self-improve [Linkpost]

Sam Marks 3 Oct 2022 19:41 UTC
4 points
0
I really appreciated this comment for making the connection between this paper and IDA.
More explicitly, to the extent that you can think of the original large language model as simulating a human, there’s an analogy between:
- asking the LLM to reason about its inputs and then training on the conclusion of the reasoning (what this paper does)
- asking simulated humans to reason about a problem and then training a new model on their conclusions (the basic idea behind iterated distillation and amplification).
This is a also a great chance for IDA skeptics to try to empirically demonstrate issues, either by finding failures of capabilities to scale, or by demonstrating alignment failures in amplified systems.