Can watching how human data annotation platforms grow, shrink, or evolve can potentially be a helpful signal for internal AI lab happenings? I follow the mercor_ai subreddit page (Mercor being a company that hires people to do data annotation work, both experts and non-experts), and they appear to have had a massive restructuring yesterday resulting in a ~35% pay cut for “generalist” type employees (It is hard to get very specific information because of NDA rules, but that much is clear from chatter). We saw a similar phenomenon with xAI last month, where they fired tons of “generalists” and replaced them with “specialists”. I see this information as a very useful data point if your goal is to determine “Are labs using synthetic/AI labeled data in the training process”, as this is the simplest explanation for data annotation needs decreasing as model size increases.
I think the fact that labs are using synthetic data is generally agreed to be true, so some may see the value of this exercise as limited, but I think the more interesting thing is to look for the firing and/or scaling back of the “expert” data annotators. My main model for short term high capabilities AI is if the models become capable of complete self-play style RL training, where the entire flywheel of “train on data → assign grade → improve on that task → generate better data” can be completed wholly by the model itself. If this capability were achieved internally, I think the scaling back of “expert” data annotating teams across various unaffiliated data annotation platforms would be a somewhat strong externally available signal. I acknowledge this as imperfect, as there are certainly other potential reasons why AI companies might want to scale back expert data labelling teams (some obvious examples being “they are running out of money because the investor money well has run dry and can no longer afford them”, “synthetic data is now ‘good enough’ even for expert-level content”, or “turns out expert level data annotation just doesn’t help that much”), but it feels like a useful heuristic to keep an eye on and take into account along with other external signals. I would recommend collecting data about as many data annotation platforms as possible to reduce potential noise.
Can watching how human data annotation platforms grow, shrink, or evolve can potentially be a helpful signal for internal AI lab happenings? I follow the mercor_ai subreddit page (Mercor being a company that hires people to do data annotation work, both experts and non-experts), and they appear to have had a massive restructuring yesterday resulting in a ~35% pay cut for “generalist” type employees (It is hard to get very specific information because of NDA rules, but that much is clear from chatter). We saw a similar phenomenon with xAI last month, where they fired tons of “generalists” and replaced them with “specialists”. I see this information as a very useful data point if your goal is to determine “Are labs using synthetic/AI labeled data in the training process”, as this is the simplest explanation for data annotation needs decreasing as model size increases.
I think the fact that labs are using synthetic data is generally agreed to be true, so some may see the value of this exercise as limited, but I think the more interesting thing is to look for the firing and/or scaling back of the “expert” data annotators. My main model for short term high capabilities AI is if the models become capable of complete self-play style RL training, where the entire flywheel of “train on data → assign grade → improve on that task → generate better data” can be completed wholly by the model itself. If this capability were achieved internally, I think the scaling back of “expert” data annotating teams across various unaffiliated data annotation platforms would be a somewhat strong externally available signal. I acknowledge this as imperfect, as there are certainly other potential reasons why AI companies might want to scale back expert data labelling teams (some obvious examples being “they are running out of money because the investor money well has run dry and can no longer afford them”, “synthetic data is now ‘good enough’ even for expert-level content”, or “turns out expert level data annotation just doesn’t help that much”), but it feels like a useful heuristic to keep an eye on and take into account along with other external signals. I would recommend collecting data about as many data annotation platforms as possible to reduce potential noise.