It is a priority to avoid implementing Neuralese CoT on frontier models because that removes essentially all of our ability to interpret their reasoning.
It seems to me that, if we could get people to do that, then we wouldn’t be in the current situation in the first place.
I mostly agree with this, but also think it’s good to just say the sane things labs should do, even if I don’t expect statements like mine to make a difference on average.
There’s some hope that, because interpretable CoT is mundanely useful, there’s incentive for even the capabilities people to keep it
It seems to me that, if we could get people to do that, then we wouldn’t be in the current situation in the first place.
I mostly agree with this, but also think it’s good to just say the sane things labs should do, even if I don’t expect statements like mine to make a difference on average.
There’s some hope that, because interpretable CoT is mundanely useful, there’s incentive for even the capabilities people to keep it