Yes, I strongly suspect that “adversarial” safety approaches are quite doomed. The more one thinks about those, the worse they look.
We need to figure out how to make “cooperative” approaches to work reliably. In this sense, I have a feeling that, in particular, the approach being developed by OpenAI has been gradually shifting in that direction (judging, for example, by this interview with Ilya I transcribed: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments).
Yes, I strongly suspect that “adversarial” safety approaches are quite doomed. The more one thinks about those, the worse they look.
We need to figure out how to make “cooperative” approaches to work reliably. In this sense, I have a feeling that, in particular, the approach being developed by OpenAI has been gradually shifting in that direction (judging, for example, by this interview with Ilya I transcribed: Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments).