I briefly tried to do mechinterp research to figure out what the algorithm distillation model was doing internally , and if diferent setups could learn in context rl but kind of gave up and started with other projects .This kind of makes me want to go back into it .
My own view on that and whether models can learn Imitation of long-term learning is that maybe it is posible I think the actual algorithm distillation setup doesn’t actually do that on their toy tasks but it is extremely simple and I would expect if something like that works it’s on more complicated things with bigger models and multiple tasks were it’s easier to learn in context RL than heuristics for every task?.
And I don’t really understand why you are so sure the answer is no.
Doesn’t even have to be the exact same Q learning algo just some aproximation that does learn over longer timesteps.
You talk about the imposible task of learning to do on its activations what the Q learning algo does on the task but that doensn’t seem obviously imposible to me? Especially for a much bigger net trying to replicate a smaller one.
And even if I agreed more with you that it seemed unlikely I would not be very sure because that seems like just a vibes based guess and it’s easy to be wrong about vibe based guess of what can be done of a transformer forward pass , and would want like actual details and though put into exactly how hard it is to represent a RL algo in a transformer and how hard it is for it to learn and why before I was pretty sure it was not posible.
There’s some papers on doing gradient descent in activations space too and how this might happen in icl that seem relevant thou haven’t read them in a long time I’ll have to look back into it .
Also glazgogabgolab on another coment has other examples of more recent work that look interesting , haven’t looked into those yet but seems posible to me there’s already some paper somewhere showing in context RL?.
Regardless this seems like is testable wich is interesting, just a lot of work.
The main problem is this is hard to do well and expensive in compute because you require lots of examples of RL training trajectories
Paragraph 1:
Α γράμμα ἐστίν. Α καὶ Β γράμματα εἰσιν. Α, Β, καὶ Γ τρία Ἑλληνικὰ γράμματά εἰσιν. Καὶ Π Ἑλληνικόν γράμμα ἐστίν, οὐ Λατινικόν. C Λατινικόν γράμμα ἐστίν, οὐχ Ἑλληνικόν.
Paragraph 2:
Β οὐ φωνῆεν, ἀλλὰ σύμφωνον ἐστιν. Β καὶ Γ οὐ φωνήεντα, ἀλλὰ σύμφωνα εἰσιν. Β οὐ μικρὸν γράμμα ἐστίν, ἀλλὰ κεφαλαῖον. β οὐ κεφαλαῖον, ἀλλὰ μικρὸν γράμμα ἐστίν. Ω = ὦ μέγα, Ο = ὂ μικρόν.
Paragraph 3:
ΑΙ Ἑλληνικὴ δίφθογγος ἐστιν. ΑΙ καὶ ΕΙ Ἑλληνικαὶ δίφθογγοι εἰσιν. Α′ δίφθογγος οὐκ ἔστιν, ἀλλ′ ἀριθμός. Α′ καὶ Β′ ἀριθμοί εἰσιν.
Paragraph 4:
«Ἀπολλώνιος» κύριον ὄνομα ἐστιν. «Ἀπολλώνιος» καὶ «Ἑλένη» κύρια ὀνόματα εἰσιν. «Ἀπολλώνιος» ἀρσενικόν ὄνομά ἐστιν (♂). «Ἑλένη» θηλυκόν ὄνομά ἐστιν (♀).
Paragraph 5:
«Salve» Λατινικὴ λέξις ἐστίν, οὐχ Ἑλληνική. «Salve» καὶ «lingua» δύο Λατινικαὶ λέξεις εἰσίν. «Χαῖρε», «γλῶσσα», καὶ «ἀριθμός» τρεῖς Ἑλληνικαὶ λέξεις εἰσίν.
I copipasted your post up to “first try ” added “can you do it ?”.
This is what I got .
Other Claude instances tell me it’s correct when I ask in different ways so it should be right but seeing other people fail is worrying.