Overall, these results demonstrate a case where LLMs can do (very basic) meta-cognition without CoT.
I figured the biggest factor would be (i) models will have trouble fully representing multiple mathematical constructs on a single token and (ii) models can only attend backwards. To elaborate on the first point, I think model’s have trouble representing both “a=0.5” and “b=0.5″ on one token because it is hard to represent multiple bindings and avoid mixing them up. However, a model may sometimes be able to chunk multiple numbers—e.g., represent “a/b has symmetry”.
I figure that mathematical operations involving three entities can be very difficult to solve without CoT unless chunks naturally form on the way to the immediate answer—e.g., if a model reads “If there is one tall person, one short person, and one suave person in a room, be scared. Alice is tall. Larry is short. Mia is room.”, it will probably be able to immediately answer “Therefore, I should be scared” because it progressively computed the 3-part-AND as the characters were being described. Yet, progressive computation is sometimes insufficient for correct answers. I figure one could probably craft some problem involving ANDs and XORs where adding filler tokens has a bigger effect than on the math problems tested.
With these things in mind, I’m pondering this question:
A bag contains red marbles and blue marbles. If two marbles are chosen at random without replacement, the probability that both marbles are red is 1⁄5, and the probability that both marbles are blue is also 1⁄5. How many marbles are in the bag?
Once the model reads the second “1/5” it can infer: the number of red balls equals the number of blue balls. Maybe that second “1/5″ needs to always hold “therefore there is symmetry” for that information to be used, or “therefore there is symmetry” has some effects that are expressed when it is attended. If this problem is repeated, then this information will be there when the model encounters “1/5” again, and that will trigger some type of new computations that lead to the final answer? Repeating the question perhaps allows the model to effectively attend forward?
With random filler tokens added, maybe the model will randomly copy information to them and sometimes—e.g., maybe “red = 1/5” gets copied and then that works similarly to the model reading the problem again.
This is cool! I was surprised by this conclusion:
I figured the biggest factor would be (i) models will have trouble fully representing multiple mathematical constructs on a single token and (ii) models can only attend backwards. To elaborate on the first point, I think model’s have trouble representing both “a=0.5” and “b=0.5″ on one token because it is hard to represent multiple bindings and avoid mixing them up. However, a model may sometimes be able to chunk multiple numbers—e.g., represent “a/b has symmetry”.
I figure that mathematical operations involving three entities can be very difficult to solve without CoT unless chunks naturally form on the way to the immediate answer—e.g., if a model reads “If there is one tall person, one short person, and one suave person in a room, be scared. Alice is tall. Larry is short. Mia is room.”, it will probably be able to immediately answer “Therefore, I should be scared” because it progressively computed the 3-part-AND as the characters were being described. Yet, progressive computation is sometimes insufficient for correct answers. I figure one could probably craft some problem involving ANDs and XORs where adding filler tokens has a bigger effect than on the math problems tested.
With these things in mind, I’m pondering this question:
Once the model reads the second “1/5” it can infer: the number of red balls equals the number of blue balls. Maybe that second “1/5″ needs to always hold “therefore there is symmetry” for that information to be used, or “therefore there is symmetry” has some effects that are expressed when it is attended. If this problem is repeated, then this information will be there when the model encounters “1/5” again, and that will trigger some type of new computations that lead to the final answer? Repeating the question perhaps allows the model to effectively attend forward?
With random filler tokens added, maybe the model will randomly copy information to them and sometimes—e.g., maybe “red = 1/5” gets copied and then that works similarly to the model reading the problem again.