Are there any papers on current efforts to tokenize video and estimating the size of available data for that?
Are there any papers on current efforts to tokenize video and estimating the size of available data for that?