Florian_Dietz comments on Florian_Dietz’s Shortform

Florian_Dietz 18 Feb 2025 0:36 UTC
1 point
0
The pondering happens in earlier layers of the network, not in the output
Each layer of a transformer produces one tensor for each token seen so far, and each of those tensors can be a superposition of multiple concepts. All of this gets processed in parallel, and anything that turns out to be useless for the end results ends up getting zeroed out at some point until only the final answer remains, which gets turned into a token. The pondering can happen in early layers, overlapping with the part of the reasoning process that actually leads to results.
That sounds like the pondering’s conclusions are then related to the task.
Yes, but only very vaguely. For example, doing alignment training on questions related to bodily autonomy could end up making the model ponder about gun control, since both are political topics. You end up with a model that has increased capabilities in gun manufacturing when that has nothing to do with your training on the surface.