This blog was written by Blessing Ajimoti, Vitor Tomaz, Hoda Maged, Mahi Shah, and Abubakar Abdulfatah as part of the Economics of Transformative AI research by BlueDot Impact and Apart Research.

Introduction

Transformative Artificial Intelligence (TAI) promises major productivity gains by automating routine and complex tasks. But it also raises serious concerns about job displacement, especially in developing countries. According to the World Economic Forum’s Future of Jobs 2025 survey, 86% of employers say AI and digital access are the biggest forces reshaping business by 2030.

While discussions about which occupations are most at risk are growing, hard evidence on TAI’s real-time impact on employment is limited. This gap is especially important for developing countries, where limited digital infrastructure, significant market informality, and lower AI adoption present both challenges and opportunities. Policymakers, employers, and researchers need better tools to track how AI is affecting jobs and when to act to prevent risks.

To help close this gap, our research team, supported by BlueDot Impact and Apart Research, developed a new empirical approach. Using the data from Anthropic’s Economic Index: Insights from Claude 3.7 Sonnet released in March 2025, the research built on recent empirical work by Handa and colleagues in their paper Which Economic Tasks Are Performed with AI? Evidence from Millions of Claude Conversations, which analyzed millions of Claude Sonnet 3.7 conversations to study AI’s role across economic tasks. We combined data from millions of AI prompts submitted to Anthropic’s Claude Sonnet 3.7 model with official employment records from Brazil (2021–2024). Then, building on the task-based framework by Acemoglu and Restrepo, we studied whether AI usage for automating or augmenting tasks was associated with changes in net job creation across occupations.

We analyzed four occupation groups according to the share of Claude conversations they had, developing an econometric model that fits the observed trends for two of the groups—the 10 occupations that use Claude the least, and the 10 occupations that use Claude the most for augmentation. The model did not fit the other two groups well, indicating other unidentified dynamics at play. Comparing the trends for the first two groups, we found no statistical evidence that they differ, suggesting that job displacement by automation is not yet happening.

The results we obtained raise important questions and prompt us to continue this investigation. The data we used covers only one week of prompts from one AI system and focuses on a single country. As AI adoption grows in the workplace, the risks of job loss could increase. Still, our findings align with other recent studies, such as the Large Language Models, Small Labor Market Effects paper by Humlum and Vestergaard, which found no measurable effects of generative AI on wages or working hours in the short term in Denmark. Expanding our data coverage to other providers of AI services, geographies, and time frames could substantially improve our understanding of the underlying interaction between AI adoption and job flows.

At the heart of our research lies a deceptively simple question: Can employment data serve as an early warning system for AI safety risks? Our findings raise important questions regarding AI safety, providing a window of opportunity for stakeholders to plan ahead of potentially irreversible job elimination, especially at a time when reactive policy making often trails technological diffusion. It gives researchers and policymakers a way to have empirical, sector and geography-specific data to assist planning. By tracking these metrics, governments can intervene before job displacements spiral into political instability or regressive regulation. In sum, this research can serve as scaffolding for a theory of change that links prompt behavior to occupation-level AI use to employment patterns to early warning signals to policy response.

What’s promising is that the method we developed is reproducible. It can be applied in other countries and expanded with more diverse data to help governments and employers monitor how TAI affects work in real time. Rather than wait for disruption, this approach offers a way to stay ahead of it.

Approach

The question

Our central research question is: Do occupations whose tasks are frequently associated with ‘automating’ AI interactions (based on Claude data) exhibit different employment trends (e.g., faster decline or slower growth) compared to other occupations, particularly those associated with ‘augmenting’ AI interactions? To address this, we’ll break down our methodology.

First, let’s clarify a few concepts:

By “occupations” we mean the occupations listed in the official US Occupations database (O*NET), and the tasks associated with them. It is how the US classifies everyone`s occupations, from firefighters to engineers and entertainers. For analytical simplicity, we focused on O*NET’s 98 Minor Groups rather than the 1596 detailed occupations.

The statement “occupations that correlate more with Claude conversations flagged as automating” packs a lot. First, remember that the Claude dataset links conversations to tasks. Second, O*NET provides the link between the 1,596 occupations and 19,530 tasks. The challenge here is to choose how those tasks contribute to an occupation or simplicity, we did not differentiate tasks in terms of contribution and used the percentage of total conversations related to a certain occupation (through a certain task) as a proxy for this.

So, by the statement above we mean ‘O*NET Minor Groups whose constituent tasks collectively account for the highest volume of Claude conversations flagged as ‘automating.’ We identified the 10 Minor Groups with the highest percentages of such conversations, labeling this group ‘Top 10 Automation.’ Similar logic was applied to form ‘Top 10 Augmentation’ (based on ‘augmenting’ conversations), ‘Top 10 Overall’ (based on all conversations), and ‘Bottom 10 Overall’ (lowest share of all conversations). The table and diagram below illustrate this classification.

Building the dataset

The first building block in our analysis is classifying occupations into 4 categories:

Alias	Concept	Share of
top_10	Occupations that are most represented across all Claude conversations	All conversations
top_10_aug	Occupations where Claude was mostly helping people, i.e, augmenting their capabilities	Conversations tagged as Augmentation
top_10_aut	Occupations where Claude was mostly doing the task itself	Conversations tagged as Automation
bottom_10	Occupations that barely showed up in Claude usage	All conversations

Table 1: Occupation Group Definitions and Conversation Share Basis.

The foundational data is sourced from Anthropic, derived from interactions with their Claude 3.7 Sonnet model. Anthropic employs a system referred to as ‘Clio’ to analyse user prompts while preserving privacy. This involves two key classification tasks:

determining if the user is augmenting their capabilities or automating work, and;
identifying the most relevant O*NET task the user appears to be performing.

It is crucial to acknowledge that the limited conversational context available to Clio means this classification has inherent accuracy limitations. Anthropic provides a summarised dataset linking O*NET tasks to automation/augmentation tags and their respective share of total conversations.

We then link this data to O*NET and summarise it to the occupation level, and tag the occupations according to the classifications we discussed above (top_10, top_10_aug, top_10_aut, and bottom_10).

Figure 1: Anthropic’s classification of tasks

Once these occupation groups were defined, we analysed their employment trends over time using real-world labour market data. For this study, we utilised Brazil’s CAGED (Cadastro Geral de Empregados e Desempregados) database, which records monthly job creation and elimination figures based on the Brazilian Classification of Occupations (CBO). We mapped these CBO codes to the O*NET Minor Groups using our established crosswalk.

Comparing employment trends across occupation groups is not straightforward.. Several factors can confound simple comparisons:

Seasonality: Job creation often exhibits yearly cycles (e.g., tourism hiring peaks in summer).
Macroeconomic factors: National employment figures are influenced by political stability, inflation, economic growth, and unforeseen events like pandemics.
Inherent fluctuations: Even without external shocks, monthly job creation and elimination numbers naturally vary.

As a result, we analyzed the data with two methodologies that attempt to remove these confounders.

First: Time series analysis

We attempted to identify a model that could disaggregate the various effects on our variable of interest (net jobs), allowing us to predict its evolution.

We departed from visual inspection of the raw data to conjecture what types of treatment it requires. First observations are that (1) data is highly cyclical, (2) the scale of the group bottom_10 is very different from others, both in terms of the mean jobs flows and their variation, and (3) there seems to be a slight and consistent downward trend (see Figure 2 below).

Figure 2- Step 0: Raw Data

The next very clear aspect of our data is the strong seasonality, i.e, the data has up- and downward swings that seem to repeat over time with a constant period. We isolate that periodic effect with a method called Seasonal Trend LOESS (STL) with a period of 12 months and subtract it from the data to obtain Figure 3. Without the periodic sign, the downward trend becomes slightly more visible (See figure 3 below).

Figure 3- Step 1: Removing seasonality

We then remove the linear trend by simply fitting the data to a linear model (through an Ordinary Least Squares estimator) and subtracting it from the data. The final chart seems relatively stationary around the normalized mean (i.e., 0), which we will test next (See Figure 4 below).

Figure 4- Step 2: Removing linear trend

The previous plot showed that the swings for the category bottom_10 are much bigger than for other categories. To facilitate comparison, we “scale” the data using z-score scaling, which simply means that, for each category, we subtract its average value and divide by its standard deviation (See Figure 5 below).

Figure 5- Step 3: Scaling data

If our procedure was successful in capturing the major drivers of fluctuation in the flow of jobs for each category, we are left with one major driver: the history of job flows itself. We test for this with what is called an Autoregressive model, which is jargon to say “a model that predicts its next data point from its previous data points”.

If the combination of category, seasonality, trend, and history explains most of the variance, we succeed in concluding that our categories are informative of job flow trends, and can start drawing conclusions, such as testing whether the linear trends between two categories are statistically different.

Second: Differences-in-Differences

The second methodology is called Difference-in-Differences. The method consists of defining a control population—in our case, occupations classified as Bottom 10 - and one or more populations that receive a “treatment” and might respond to it in different ways—in our case, the other populations are the classifications Top 10, Top 10 Aut, and Top 10 Aug. The “treatment” here is considered the launch of Chat GPT 4 in March 2023. We chose ChatGPT 4 because it marked the start of highly precise data extraction capabilities and scoring 70%+ on several benchmarks. The key assumption of the model is that the groups (control and treatment) would follow a similar trend had they not received treatment, but diverge due to treatment.

After defining the treatment, the control, and the treatment group, the analysis consists of creating a model that attempts to capture the effects of the treatment (or lack thereof) and other important variables. Our model is of the form:

net_jobs ~ treated + post + treated_post + C(month): treated + noise

Which can be interpreted as “when looking at the net change in jobs between one month and the previous for a given occupation, the result is a combination of the effects of whether the occupation is treatment or control, whether the date is before or after the “treatment” date, and the month of the year, plus some random noise. Moreover, we take into account interactions between these predictors (e.g., if the datapoint refers to an occupation in the treatment group and if the date is after treatment)”.

The main assumption of this test is the ‘parallel trends assumption’, which means that we need strong reasons to believe that each group, had there been no treatment (i.e., the launch of ChatGPT 4.0), would behave similarly.

We apply this model separately to compare each of the three “Top” groups, each one representing a treatment group, against the Bottom 10 group, which is the control group.

Discussion

Time series analysis

As can be seen in Figure 5, our methodology seems to capture a lot of what is going on! However, what meets the eye is not necessarily the whole story. To make sure our model is reliable, we run a couple of statistical checks. First, we use something called the Akaike Information Criterion — or AIC for short — to help us decide how many months we should look back to understand where the net jobs for this month are going, i.e., to help us decide the lag of the model.

Then, we run the Augmented Dickey-Fuller (ADF) test, which checks if the data is stable enough over time for this kind of model to work well. In simple terms, we use AIC to choose the best parameter, and the ADF to see if the model is adequate for the data. If the ADF’s p-value is too high (usually meaning greater than 0.05), then it means that we have strong reasons to believe our model does not fit the data well, and therefore our beliefs about the underlying dynamics of net jobs. The table below shows the results of this test.

Category	Lag	ADF p-value	AIC
top_10	6	0.518	-0.72
bottom_10	1	0.013	-22.87
top_10_aug	6	0.021	-1.61
top_10_aut	6	0.666	7.25

Table 2: Summary statistics for the Autoregressive models

As a standard practice, if the ADF p-value below 0.05, we reject the idea that the data is unstable (non-stationary) - as is the case for categories Top 10 Aug and Bottom 10. Because the values for Top 10 and Top 10 Aut are high, we have strong reasons to believe our model does not capture well the main drivers of change in job flows for these two categories.

So far, our model does a good job in capturing the main drivers of job flow divergence between the categories Bottom 10 and Top 10 Aug, as can be seen by how similar the series are. As a final check, we confirm with a cross-correlation plot (below). The main insights of the plot are that (1) the value of Bottom 10 at a certain time explains almost 80% of the variance in the value of Top 10 Aug (which is very high!), and (2) the correlation decays with lag, meaning that the time structure of the series are somewhat preserved.

Our final goal is to answer the question: “Are occupations that correlate more with Claude conversations flagged as automating falling faster or growing more slowly than other occupations?”, and now we have the tools to do so! Remember our detrending? We used a separate linear trend for each category, so now we can test if the difference in their rates of change is statistically significant!

Testing whether the trends for Bottom 10 and Top 10 Aug differ

With a model that fits reasonably well with our data for the categories Bottom 10 and Top 10 Aug, we can use what we learned to assess whether their trends are different. We do so by fitting a linear regression model that includes a term for the category. We then plot the de-seasonalized and normalized data with their respective trends:

Figure 6- Step 4: Normalized and de-seasonalized series for the groups

Figure 6 shows the overlap in the 95% confidence bands, which suggests that the trends do not differ. Analyzing the summary statistics of our model, the p-value of the interaction term is well above the significance threshold, at p=0.53.

So what?

We have covered a lot of ground, and the results are somewhat mixed, but very interesting nonetheless. The key takeaways from our analyses are:

Our model (seasonal signal + linear trend + autoregression with lag) seems to describe very well the data for the categories Bottom 10 and Top 10 Aug, although a more thorough statistical analysis is important to deal with some assumptions that might not hold (e.g., normality of residuals in the last linear regression)
It describes less well the data for the categories Top 10 and Top 10 Aut, which invites us to (1) update the model to include other confounding variables and (2) test other forms of aggregation for these categories, as the ones we are using might be too heterogeneous. The good results for Top 10 Aug, for example, can be explained by how similar the occupations in this category are, given that most of the occupations in this category are heavily reliant on computer use.
When we compare the linear trends between the Bottom 10 and Top 10 Augmented, there does not seem to be a significant difference, suggesting that occupations in these two categories still follow similar dynamics, and our categorization is not informative.

Difference-in-Differences

We ran an Ordinary Least Squares regression on the model described in the Approach section; the result is in the additional notes section.

The numbers that interest us the most are the R-squared (0.838), the F-statistic (2.37e-26), and the column P>|z| for the treatment (0.000), post (0.213), and interaction (0.603) terms. The elevated R-squared value means that our model captures most of the variance in the data, while the low p-value of the F-statistic indicates that the results are statistically significant. What is interesting, however, is that the p-value for treatment is very close to 0, while for the interaction between post and treatment is well above the traditional p<0.05 threshold for statistical significance. One way to read these results is that the chosen treatment does not seem to have any effect on the prediction of net jobs; the category itself, however, has a significant (p<0.05) predictive power. That aligns with what we’d expect, as we are comparing fundamentally different occupations such as engineers and computational occupations (Top 10 Aut group) and building cleaning and pest control workers or animal care workers (Bottom 10). Future work could focus on applying more modern techniques for differences-in-differences when the parallel trends assumption is broken, as appears to be the case in this analysis.

Potential reasons for our results

For the analyses where models fit well and assumptions held, we found that there is no significant difference between the four categories of net jobs examined, which does not support our initial hypotheses. However, these results can change, and hence, policymakers need to monitor the labor market and be prepared for any disruptions.

Possible reasons for our conclusion, in addition to the limitations of our research, include:

AI tools adopted in Brazil are still not advanced enough or well integrated to result in mass job displacement
It is still too early to observe replacement dynamics
Brazilians’ hesitance to AI tools adoption; a recent poll on LinkedIn shows 83% expressed their fear of losing their jobs to automation. Interestingly, Humlum and Vestergaard’s study finds that, for now, AI automation poses no significant risk to jobs or income, even when employers promote its use in the workplace in Denmark.
Inadequate access to AI training in Brazil and, more generally, high levels of functional illiteracy (29% in 2025)
AI tools may replace certain tasks, but not always the whole occupation that performs them.

These agree with the barriers Rosa and Kubta highlighted based on their surveys; three barriers to AI adoption in Brazil are low awareness about AI in small firms, high costs of implementation, and low labor skills.

Limitations

In this section, we discuss several of the limitations and suggest ways to develop our model in the future to have a more fine-grained “microscope” to analyze the econometrics of TAI. Some of these limitations have been discussed by Handa and colleagues.

Conversations dataset timeframe: The dataset was generated from one week of prompts to Claude Sonnet 3.7. With such a short timeframe, the data does not capture seasonality and might be anomalous in many different ways. In the future, obtaining data for longer time frames could significantly increase the quality of the data.
Model and provider diversity: Similarly, the dataset is limited to Claude Sonnet 3.7, which is a primarily text-based LLM, with a very particular user profile. Expanding the dataset to multiple relevant models could help generate more informative categories.
Clio methodology for conversation classification: Handa and colleagues offer limited insight into the classification methodology, which also limits our ability to understand the output of their analysis and exercise critical judgment on the quality of classification. The data discussed also omits inclusion of Team, Enterprise, and API customers; the incompleteness of this data currently prevents us from making more meaningful and confident conclusions about the impact of AI on jobs.
Occupations crosswalk. We use Muendler’s 2010 mapping from Brazil’s CBO to the international ISCO standard, as it remains widely used and among the most reliable. Although Brazil updated the CBO in 2020, no official ISCO mapping followed. As a result, newer roles like “data scientist” may be missing or only roughly classified, despite significant shifts in Brazil’s labor market over the past decade.

Due to time constraints, we also relied on a large language model (LLM) to assist with some mappings, but couldn’t fully review the output. Future iterations of our model should draw from more diverse data sources and prioritize transparency in how mappings are produced.

Key metric used. The percentage of total Claude interactions does not offer insight into what percentage of the total work of an occupation is represented by it, and how much time it saves.

Recommendations

We propose policy recommendations that can mitigate the impact of TAI in Brazil and other developing countries as the AI adoption rate accelerates. The implementation of these strategies should be done by governments and frontier AI companies individually and collaboratively. They should

Build a shared AI and jobs data system: Make it easier for frontier AI companies, employers, and governments to share data (safely and ethically) about how AI is being used for work tasks. This helps us understand where AI is helping and where it might be replacing people.
Spot occupation disruption before it happens: We need tools (like dashboards) that track AI use in real time, flagging occupations or tasks where automation is growing fast. The sooner we see the trend, the sooner we can act. Use this empirical approach in this paper as a starting point for an early warning system.
Create a global watchdog for AI and work: Imagine a coalition of experts, governments, and international organizations keeping an eye on how AI is affecting the labor markets, and then sharing regular updates that the public can understand.
Push for AI that supports, not replaces, workers: Invest in AI that boosts human productivity instead of replacing people. That also means setting ethical guidelines that include real checks on labor impact.
Reward companies that augment, not replace: Companies that use AI to enhance their workforce and commit to retraining employees should get incentives like tax breaks or funding support.
Plan for transitions, not just disruptions: Governments and frontier AI companies need to prepare for scenarios where safety nets would need to be deployed when AI adoption leads to occupation displacements. They should work together on retraining programs, digital infrastructure, and safety nets such as unemployment benefits, upskilling support, and even affordable access to high-impact AI tools, which is potentially funded by an agreed percentage of the AI companies’ profits, e.g the Windfall Clause.

Future work/Potential next steps

Compare developed vs. developing nations studies: Conduct studies that explicitly compare AI’s labor market implications in developed nations versus developing ones, identifying unique challenges and opportunities for each.
Refine and standardize cross-classification methods: Continue to develop more robust, transparent, and potentially AI-assisted methodologies for crosswalking national occupational classifications to international standards like the International Standard Classification of Occupations (ISCO), the Standard Occupational Classification (SOC), and O*NET.
Formalize the framework: Make it into a replicable tool or dashboard that can incorporate future LLM usage datasets.
Leverage new data sources: Explore the use of real-time data, occupation postings, reports from usage reports from more AI companies to track AI adoption and its impact on skills and job demand globally. Include datasets that are more representative of global AI use in the analysis.

Additional notes

Data sources

Anthropic Economic Index (Paper): [2503.04761] Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

This paper provides the conceptual framework for our study. It introduces a taxonomy of economic tasks based on real-world AI interactions, identifying where AI systems like Claude show strong performance. We use this task classification to assess which occupations in Brazil are most exposed to automation or augmentation.

Anthropic Economic Index (Dataset): Anthropic/EconomicIndex · Datasets at Hugging Face

This dataset operationalizes the task taxonomy from the paper. It assigns quantitative exposure scores to hundreds of economic tasks, based on how well AI models perform them. We link these task scores to occupations through O*NET descriptors, allowing us to compute AI risk metrics for Brazilian jobs.

“Job Concordances for Brazil: Mapping the Classificação Brasileira de Ocupações (CBO) to the International Standard Classification of Occupations (ISCO-88 )” [cbo2isco.pdf]

In establishing the initial CBO to ISCO linkage for our study, we drew upon the foundational work presented in “Job Concordances for Brazil: Mapping the Classificação Brasileira de Ocupações (CBO) to the International Standard Classification of Occupations (ISCO-88)” (cbo2isco.pdf). This paper details a comprehensive concordance between CBO and the 1988 version of ISCO (ISCO-88).

Regression results

OLS regression for the Time Series Analysis

OLS regression for Difference-in-Differences

Evaluating the Risk of Job Displacement by Transformative AI Automation in Developing Countries: A Case Study on Brazil