I really think you need a proof of concept with text, rather than images. I’d suggest targeting one of the smaller TinyStories models (perhaps a 1-bit or 1-trit quantized version of one). Then I’d look for some sort of parallel to an alignment property: e.g. without just hard-coding it, can you modify the code to guarantee (at the “convincing argument” level, not formal proof) some property of the interactions between child characters and parent characters in the stories?
I really think you need a proof of concept with text, rather than images. I’d suggest targeting one of the smaller TinyStories models (perhaps a 1-bit or 1-trit quantized version of one). Then I’d look for some sort of parallel to an alignment property: e.g. without just hard-coding it, can you modify the code to guarantee (at the “convincing argument” level, not formal proof) some property of the interactions between child characters and parent characters in the stories?