DPiepgrass comments on Steering GPT-2-XL by adding an activation vector

DPiepgrass 13 May 2023 21:15 UTC
3 points
0
I don’t really know how GPTs work, but I read §”Only modifying certain residual stream dimensions” and had a thought. I imagined a “system 2” AGI that is separate from GPT but interwoven with it, so that all thoughts from the AGI are associated with vectors in GPT’s vector space.
When the AGI wants to communicate, it inserts a “thought vector” into GPT to begin producing output. It then uses GPT to read its own output, get a new vector, and subtract it from the original vector. The difference represents (1) incomplete representation of the thought and (2) ambiguity. Could it then produce more output based somehow on the difference vector, to clarify the original thought, until the output eventually converges to a complete description of the original thought? It might help if it learns to say things like “or rather”, “I mean”, and “that came out wrong. I meant to say” (which are rare outputs from typical GPTs). Also, maybe an idea like this could be used to enhance summarization operations, e.g. by generating one sentence at a time, and for each sentence, generating 10 sentences and keeping only the one that best minimizes the difference vector.