Alex Flint comments on Paper: Large Language Models Can Self-improve [Linkpost]

Alex Flint 3 Oct 2022 17:43 UTC
LW: 6 AF: 4
0
AF
What they are doing here is:
1. Train a large language model unsupervised in standard way
2. Repeat:
  1. Take some supervised learning problem, throw away the supervised outputs, and repeat:
    Pick a supervised input and use the language model to sample N different outputs
    Select a “consensus” output among the N sampled outputs
    Store this “consesus output” as the new “target output” for the selected input
  2. Fine-tune the language model in supervised way using the input/output pairs generated above
It’s quite reasonable for them to call this “self-improvement” but it’s only vaguely related to the kind of self-improvement that was much discussed in 2010s-era AI safety circles. That kind of self-improvement was about an AI that can improve its own architecture. The kind of self-improvement discussed in this paper is not about improving its own architecture.

Still, its bizarre that the thing being done in this paper works at all. It’s as if the mere act of narrowing the distribution of outputs for a new supervised learning task is innately homing in on the truth… whatever that means. Strange.