Suggest writing an exercise for the reader using this, first writing up the core idea and why it seemed hopeful and the formalism, then saying this is dangerously broken please find the flaw without reading the spoilers.
More broken ideas should do this, practice for red teaming ambitious theory work is rare and important.
If people want spoilers, I can give them, but I do not have bandwidth to grade your assignments and on the real test no one will be capable of doing so. Gl :)
Suggest writing an exercise for the reader using this, first writing up the core idea and why it seemed hopeful and the formalism, then saying this is dangerously broken please find the flaw without reading the spoilers.
More broken ideas should do this, practice for red teaming ambitious theory work is rare and important.
Relatedly: Here’s my broken ambitious outer alignment plan: Universal Alignment Test. It’s not actually written up quite right to be a good exercise for the reader yet, but I removed the spoilers mostly.
If people want spoilers, I can give them, but I do not have bandwidth to grade your assignments and on the real test no one will be capable of doing so. Gl :)