Not for my purposes. For starters I use a lot of image and video generation, and even then you have U-nets and DITs so I need something more generalized. Also, if I’m not mistaken, what you’ve described is only applicable to autoregressive transformers like ChatGPT. Compare to say T5 which is not autoregressive.
Not for my purposes. For starters I use a lot of image and video generation, and even then you have U-nets and DITs so I need something more generalized. Also, if I’m not mistaken, what you’ve described is only applicable to autoregressive transformers like ChatGPT. Compare to say T5 which is not autoregressive.