relying on the CoT staying legible because it’s English, and
hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,
were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)
To be clear, I don’t think that in general it is right to say “Doing the right thing is hopeless because no one else is doing it”, I typically prefer to rather “do the thing that if everyone did that, the world would be better”. My intuition is that it makes sense to try to coordinate on bottlenecks like introducing compute governance and limiting flops, but not on a specific incremental improvement of AI techniques, because I think the people thinking things like “I will restrain myself from using this specific AI sub-techinque because it increases x-risk” are not coordinated enough to self-coordinate at that level of detail, and are not powerful enough to have an influence through small changes.
(Again, I am not confident, I can imagine paths were I’m wrong, haven’t worked through them.)
(Conflict of interest disclosure: I collaborate with people who started developing this kind of stuff before Meta.)
The 2 here IMO is more significant than 1, in the sense that if CoT is eventually replaced, it won’t be because it’s no longer faithful, but rather that companies will eventually replace CoT with something less interpretable.
Update: after a few months, my thinking has moved more to “since you are uncertain, at least do not shoot yourself in the foot first”, by which I mean don’t actively develop neuralese based on complicated arguments about optimal decisions in collective problems.
I also updated a bit negatively on the present feasibility of neuralese, although I think in principle it’s possible to do, and may be done in the future.
My current thinking is that
relying on the CoT staying legible because it’s English, and
hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,
were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)
To be clear, I don’t think that in general it is right to say “Doing the right thing is hopeless because no one else is doing it”, I typically prefer to rather “do the thing that if everyone did that, the world would be better”. My intuition is that it makes sense to try to coordinate on bottlenecks like introducing compute governance and limiting flops, but not on a specific incremental improvement of AI techniques, because I think the people thinking things like “I will restrain myself from using this specific AI sub-techinque because it increases x-risk” are not coordinated enough to self-coordinate at that level of detail, and are not powerful enough to have an influence through small changes.
(Again, I am not confident, I can imagine paths were I’m wrong, haven’t worked through them.)
(Conflict of interest disclosure: I collaborate with people who started developing this kind of stuff before Meta.)
The 2 here IMO is more significant than 1, in the sense that if CoT is eventually replaced, it won’t be because it’s no longer faithful, but rather that companies will eventually replace CoT with something less interpretable.
Update: after a few months, my thinking has moved more to “since you are uncertain, at least do not shoot yourself in the foot first”, by which I mean don’t actively develop neuralese based on complicated arguments about optimal decisions in collective problems.
I also updated a bit negatively on the present feasibility of neuralese, although I think in principle it’s possible to do, and may be done in the future.