rotatingpaguro comments on Worries about latent reasoning in LLMs

rotatingpaguro 21 Jan 2025 10:50 UTC
3 points
2
My current thinking is that
1. relying on the CoT staying legible because it’s English, and
2. hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,
were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)
To be clear, I don’t think that in general it is right to say “Doing the right thing is hopeless because no one else is doing it”, I typically prefer to rather “do the thing that if everyone did that, the world would be better”. My intuition is that it makes sense to try to coordinate on bottlenecks like introducing compute governance and limiting flops, but not on a specific incremental improvement of AI techniques, because I think the people thinking things like “I will restrain myself from using this specific AI sub-techinque because it increases x-risk” are not coordinated enough to self-coordinate at that level of detail, and are not powerful enough to have an influence through small changes.
(Again, I am not confident, I can imagine paths were I’m wrong, haven’t worked through them.)
(Conflict of interest disclosure: I collaborate with people who started developing this kind of stuff before Meta.)
- Noosphere89 21 Jan 2025 14:37 UTC
  3 points
  0
  Parent
  The 2 here IMO is more significant than 1, in the sense that if CoT is eventually replaced, it won’t be because it’s no longer faithful, but rather that companies will eventually replace CoT with something less interpretable.
- rotatingpaguro 10 May 2025 7:08 UTC
  2 points
  1
  Parent
  Update: after a few months, my thinking has moved more to “since you are uncertain, at least do not shoot yourself in the foot first”, by which I mean don’t actively develop neuralese based on complicated arguments about optimal decisions in collective problems.
  
  I also updated a bit negatively on the present feasibility of neuralese, although I think in principle it’s possible to do, and may be done in the future.