Gabe M comments on Discovering Language Model Behaviors with Model-Written Evaluations

Gabe M 25 Dec 2022 7:42 UTC
3 points
0
Big if true! Maybe one upside here is that it shows current LLMs can definitely help with at least some parts of AI safety research, and we should probably be using them more for generating and analyzing data.

...now I’m wondering if the generator model and the evaluator model tried to coordinate😅