I mostly agree with this, but also think it’s good to just say the sane things labs should do, even if I don’t expect statements like mine to make a difference on average.
There’s some hope that, because interpretable CoT is mundanely useful, there’s incentive for even the capabilities people to keep it
I mostly agree with this, but also think it’s good to just say the sane things labs should do, even if I don’t expect statements like mine to make a difference on average.
There’s some hope that, because interpretable CoT is mundanely useful, there’s incentive for even the capabilities people to keep it