B should really just concern itself with providing a non-manipulated plan of actions
If I understand correctly, this is literally Kokotajlo’s proposal. Alas, this is hard to generalize to neuralese AIs...
If I understand correctly, this is literally Kokotajlo’s proposal. Alas, this is hard to generalize to neuralese AIs...