This seems especially easy to do with RWKV. Perhaps I will try that out sometime this week, but probably not. I did something like that (but not specifically for reasoning) two years ago for MIT’s Splash.
My code from Splash is on Github if anyone else wants to give it a try before I (maybe) get to it. The code is very bad, though. It’s just something I wrote for myself for the most part (so it doesn’t have any kind of documentation, and it has lots of dead code lying around). So, it might be easier to just write your own code.
This seems especially easy to do with RWKV. Perhaps I will try that out sometime this week, but probably not. I did something like that (but not specifically for reasoning) two years ago for MIT’s Splash.
My code from Splash is on Github if anyone else wants to give it a try before I (maybe) get to it. The code is very bad, though. It’s just something I wrote for myself for the most part (so it doesn’t have any kind of documentation, and it has lots of dead code lying around). So, it might be easier to just write your own code.