I mean, literally Stable Diffusion. If you mean in the brain I would have to refresh my memory but I vaguely remember looking at the audio encoding path and realizing it’s something like “These monomodal audio encoders are spatially close to the ear and then feed into multimodal encoders which take the encoded audio as an input and then further feed into e.g. Wernicke’s Area”, but that’s a very lossy memory and I probably have the details wrong.
I am in awe— great rant. Do you have examples for models being trained in the latent space of another model?
I mean, literally Stable Diffusion. If you mean in the brain I would have to refresh my memory but I vaguely remember looking at the audio encoding path and realizing it’s something like “These monomodal audio encoders are spatially close to the ear and then feed into multimodal encoders which take the encoded audio as an input and then further feed into e.g. Wernicke’s Area”, but that’s a very lossy memory and I probably have the details wrong.