I think you’re essentially correct—but if I understand you, what you’re suggesting is similar to Chris Olah et al’s Circuits work (mentioned above in the paragraph starting “This sort of interpretability is distinct...”). If you have a viable approach aiming at that kind of transparency, many people will be eager to provide whatever resources are necessary. This is being proposed as something different, and almost certainly easier.
One specific thought:
but my intuition suggests this would limit the complexity of the prompt by shackling it’s creation to an unnecessary component, the thought
To the extent that this is correct, it’s more of a feature than a bug. You’d want the thoughts to narrow the probability distribution over outputs. However, I don’t think it’s quite right: the output can still have just as much complexity; the thoughts only serve to focus that complexity.
E.g. consider [This will be a realist novel about 15th century France] vs [This will be a surrealist space opera]. An output corresponding to either can be similarly complex.
I think you’re essentially correct—but if I understand you, what you’re suggesting is similar to Chris Olah et al’s Circuits work (mentioned above in the paragraph starting “This sort of interpretability is distinct...”). If you have a viable approach aiming at that kind of transparency, many people will be eager to provide whatever resources are necessary.
This is being proposed as something different, and almost certainly easier.
One specific thought:
To the extent that this is correct, it’s more of a feature than a bug. You’d want the thoughts to narrow the probability distribution over outputs. However, I don’t think it’s quite right: the output can still have just as much complexity; the thoughts only serve to focus that complexity.
E.g. consider [This will be a realist novel about 15th century France] vs [This will be a surrealist space opera]. An output corresponding to either can be similarly complex.