This is really interesting. I have a bunch of nitpicks but I’ll shelf them for now because I want to focus on the valuable idea.
> From the perspective of the MLP and the attention block at a given position, doing prefill is indistinguishable from doing decode. Since no component in the transformer can tell which mode it’s in, both should produce the same experience.
Very nice argument! “If X cannot tell whether it’s in mode A or B, then A and B cannot be producing different conscious experiences in X”.
I’m going to think some more about this all. I hope to return with comments in a few days.
This is really interesting. I have a bunch of nitpicks but I’ll shelf them for now because I want to focus on the valuable idea.
> From the perspective of the MLP and the attention block at a given position, doing prefill is indistinguishable from doing decode. Since no component in the transformer can tell which mode it’s in, both should produce the same experience.
Very nice argument! “If X cannot tell whether it’s in mode A or B, then A and B cannot be producing different conscious experiences in X”.
I’m going to think some more about this all. I hope to return with comments in a few days.