tailcalled comments on Using GPT-Eliezer against ChatGPT Jailbreaking

tailcalled 6 Dec 2022 23:10 UTC
11 points
2
I think this doesn’t scale with capabilities of the Eliezer-model: I have a hunch that the real Eliezer Yudkowsky would not consider this to be sufficient safety and would therefore reject the task. As you improve the capabilities of your Eliezer-model, it would presumably also reject the task and thereby become useless.