se RL reward functions are written in code, not in natural language.
Often though they involve using LLMs or humans to make fuzzy judgment calls e.g. about what is or isn’t an obedient response to an instruction.
My discussion in §2.4.1 is about making fuzzy judgment calls using trained classifiers, which is not exactly the same as making fuzzy judgment calls using LLMs or humans, but I think everything I wrote still applies.
Often though they involve using LLMs or humans to make fuzzy judgment calls e.g. about what is or isn’t an obedient response to an instruction.
My discussion in §2.4.1 is about making fuzzy judgment calls using trained classifiers, which is not exactly the same as making fuzzy judgment calls using LLMs or humans, but I think everything I wrote still applies.