Tomek Korbak comments on COT control: The Word Disappears, but the Thought Does Not

Tomek Korbak 27 Mar 2026 20:24 UTC
3 points
0
Cool work! Just to make sure I understand, your segment embeddings aren’t derived from residual stream of the source model? It’s just the case that many sentences omitting a keyword X still represent X saliently?
- Pranjal Garg 28 Mar 2026 16:52 UTC
  2 points
  0
  Parent
  That is correct, currently the segment embeddings are not derived from the residual stream, primarily because I do not have compute resources, and I kind of did this alone on my home laptop. The segment embeddings are derived from an external embedder, but I hope to overcome this limitation and see if a range of external embedders give me a different result. Additionally, I do plan to cache the internal embeddings, provided I secure compute resources.
  
  So yes, the finding is that after the removal of the keyword, the CoT sentences still encode what that keyword represents. This is honestly currently a weaker claim, but I think it is a bit more practical because of the pure black-box approach that can be deployed at the user level for closed frontier models.