I don’t understand it. What’s the difference between this idea and introducing neuralese? In the AI2027 forecast neuralese is the very thing preventing researchers from understanding that Agent-3 is misaligned and opening the way to Agent-4 and takeover. Didn’t AI safetyists already call for preservation of the CoT transparency?
I don’t understand it. What’s the difference between this idea and introducing neuralese? In the AI2027 forecast neuralese is the very thing preventing researchers from understanding that Agent-3 is misaligned and opening the way to Agent-4 and takeover. Didn’t AI safetyists already call for preservation of the CoT transparency?