one useful heuristic is datasets are always garbage. it’s always been true in ML that most datasets have huge glaring flaws. like language datasets with documents consisting entirely of backslashes, or impossible multiple choice questions with nonsensical answers or whatever
one useful heuristic is datasets are always garbage. it’s always been true in ML that most datasets have huge glaring flaws. like language datasets with documents consisting entirely of backslashes, or impossible multiple choice questions with nonsensical answers or whatever