Big if true! Maybe one upside here is that it shows current LLMs can definitely help with at least some parts of AI safety research, and we should probably be using them more for generating and analyzing data.
...now I’m wondering if the generator model and the evaluator model tried to coordinate😅
Big if true! Maybe one upside here is that it shows current LLMs can definitely help with at least some parts of AI safety research, and we should probably be using them more for generating and analyzing data.
...now I’m wondering if the generator model and the evaluator model tried to coordinate😅