No captions feels very unnatural because both llms and humans could first apply relatively dumb speech to text tools.
No captions feels very unnatural because both llms and humans could first apply relatively dumb speech to text tools.