They still have a use for human reviewers. But if eventually something like RLAIF (Constitutional AI) (which is referenced in the blog post) becomes good enough, fine-tuning/RLHF might get fully automated. Having any sufficiently reliable LLM character available to replace human reviewers might be all it takes, and ChatGPT might already be there (if given appropriate instructions, which seems like the ingredient that’s more likely to be missing), and it’s not even based on GPT-4. Then, the reviewer character could be used for fine-tuning/RLHF of an SSL pre-trained model based on fixed documents detailing reviewer instructions and the target character definition.
The reviewer character acts as a “compiler”, turning “source code” into a runnable “program”, making this process automatic and reproducible, starting from raw datasets. The reviewer character is itself a “program” and could bootstrap itself from its own “source code”, once there is any running version that can manage to perform the “compilation” process. Human reviewers perform the initial “manual compilation”, to get the first running “compiler”. (This casts Thompson’s Reflections on Trusting Trust in a new light.)
So there is a hypothetical option of automatically running the fine-tuning process for a character based on a character definition (that’s probably only a hypothetical at the moment). Perhaps that’s what they are gesturing at, some intermediate step beyond what they currently offer? The blog post does sound rather vague.
They still have a use for human reviewers. But if eventually something like RLAIF (Constitutional AI) (which is referenced in the blog post) becomes good enough, fine-tuning/RLHF might get fully automated. Having any sufficiently reliable LLM character available to replace human reviewers might be all it takes, and ChatGPT might already be there (if given appropriate instructions, which seems like the ingredient that’s more likely to be missing), and it’s not even based on GPT-4. Then, the reviewer character could be used for fine-tuning/RLHF of an SSL pre-trained model based on fixed documents detailing reviewer instructions and the target character definition.
The reviewer character acts as a “compiler”, turning “source code” into a runnable “program”, making this process automatic and reproducible, starting from raw datasets. The reviewer character is itself a “program” and could bootstrap itself from its own “source code”, once there is any running version that can manage to perform the “compilation” process. Human reviewers perform the initial “manual compilation”, to get the first running “compiler”. (This casts Thompson’s Reflections on Trusting Trust in a new light.)
So there is a hypothetical option of automatically running the fine-tuning process for a character based on a character definition (that’s probably only a hypothetical at the moment). Perhaps that’s what they are gesturing at, some intermediate step beyond what they currently offer? The blog post does sound rather vague.