Agreed that likely humans would outperform more! At the moment we don’t have a human baseline for AmongUs vs. language models yet, so we wouldn’t be able to tell if it improved, but it’s a good follow-up.
Agreed that likely humans would outperform more! At the moment we don’t have a human baseline for AmongUs vs. language models yet, so we wouldn’t be able to tell if it improved, but it’s a good follow-up.