Relatedly: Here’s my broken ambitious outer alignment plan: Universal Alignment Test. It’s not actually written up quite right to be a good exercise for the reader yet, but I removed the spoilers mostly.
If people want spoilers, I can give them, but I do not have bandwidth to grade your assignments and on the real test no one will be capable of doing so. Gl :)
Relatedly: Here’s my broken ambitious outer alignment plan: Universal Alignment Test. It’s not actually written up quite right to be a good exercise for the reader yet, but I removed the spoilers mostly.
If people want spoilers, I can give them, but I do not have bandwidth to grade your assignments and on the real test no one will be capable of doing so. Gl :)