Google/Deepmind has publicly advocated preserving CoT Faithfullness/Moniterability as long as possible. However, they are also leading the development of new architectures like Hope and Titans which would bypass this with continuous memory. I notice I am confused. Is the plan to develop these architectures and not deploy them? If so, why did they publish them?
Edit: Many people have pointed out correctly that Hope and Titans don’t break CoT and it’s a separate architectural improvement. Therefore I no longer endorse the above take. Thanks for correcting my confusion!
Maybe useful to note that all the Google people on the “Chain of Thought Monitorability” paper are from Google Deepmind, while Hope and Titans are from Google Research.
This seems like a misunderstanding of Hope/Titans.
The “continuous memory” is a replacement for the attention mechanism, not a reasoning medium. All else equal, a reasoning model based on these architectures would still be reasoning in text/tokens (it would just be doing so with lower memory and compute usage).
I don’t see how this breaks CoT. The memory module in Titans stores surprising information as it’s encountered and then allows the transformer to look at it later on, but it doesn’t synthesize new information. Strikes me as two entirely compatible augmentations of the transformer architecture.
Google/Deepmind has publicly advocated preserving CoT Faithfullness/Moniterability as long as possible. However, they are also leading the development of new architectures like Hope and Titans which would bypass this with continuous memory. I notice I am confused. Is the plan to develop these architectures and not deploy them? If so, why did they publish them?
Edit: Many people have pointed out correctly that Hope and Titans don’t break CoT and it’s a separate architectural improvement. Therefore I no longer endorse the above take. Thanks for correcting my confusion!
Maybe useful to note that all the Google people on the “Chain of Thought Monitorability” paper are from Google Deepmind, while Hope and Titans are from Google Research.
This seems like a misunderstanding of Hope/Titans.
The “continuous memory” is a replacement for the attention mechanism, not a reasoning medium. All else equal, a reasoning model based on these architectures would still be reasoning in text/tokens (it would just be doing so with lower memory and compute usage).
Yep, I think you’re right, thanks for pointing this out.
I don’t see how this breaks CoT. The memory module in Titans stores surprising information as it’s encountered and then allows the transformer to look at it later on, but it doesn’t synthesize new information. Strikes me as two entirely compatible augmentations of the transformer architecture.
This seems right—I was confused about the original paper. My bad.