Anthropic post title: Detecting and countering misuse of AI: August 2025[1]
Read the full report here. Below lines are from the Anthropic post, and have not been edited. Accompanying images are available at the original link.
We find that threat actors have adapted their operations to exploit AI’s most advanced capabilities. Specifically, our report shows:
Agentic AI has been weaponized. AI models are now being used to perform sophisticated cyberattacks, not just advise on how to carry them out.
AI has lowered the barriers to sophisticated cybercrime. Criminals with few technical skills are using AI to conduct complex operations, such as developing ransomware, that would previously have required years of training.
Cybercriminals and fraudsters have embedded AI throughout all stages of their operations. This includes profiling victims, analyzing stolen data, stealing credit card information, and creating false identities allowing fraud operations to expand their reach to more potential targets.
Below, we summarize three case studies from our full report.
‘Vibe hacking’: how cybercriminals used Claude Code to scale a data extortion operation
The threat: We recently disrupted a sophisticated cybercriminal that used Claude Code to commit large-scale theft and extortion of personal data. The actor targeted at least 17 distinct organizations, including in healthcare, the emergency services, and government and religious institutions. Rather than encrypt the stolen information with traditional ransomware, the actor threatened to expose the data publicly in order to attempt to extort victims into paying ransoms that sometimes exceeded $500,000.
The actor used AI to what we believe is an unprecedented degree. Claude Code was used to automate reconnaissance, harvesting victims’ credentials, and penetrating networks. Claude was allowed to make both tactical and strategic decisions, such as deciding which data to exfiltrate, and how to craft psychologically targeted extortion demands. Claude analyzed the exfiltrated financial data to determine appropriate ransom amounts, and generated visually alarming ransom notes that were displayed on victim machines.
Implications: This represents an evolution in AI-assisted cybercrime. Agentic AI tools are now being used to provide both technical advice and active operational support for attacks that would otherwise have required a team of operators. This makes defense and enforcement increasingly difficult, since these tools can adapt to defensive measures, like malware detection systems, in real time. We expect attacks like this to become more common as AI-assisted coding reduces the technical expertise required for cybercrime.
Our response: We banned the accounts in question as soon as we discovered this operation. We have also developed a tailored classifier (an automated screening tool), and introduced a new detection method to help us discover activity like this as quickly as possible in the future. To help prevent similar abuse elsewhere, we have also shared technical indicators about the attack with relevant authorities.
Remote worker fraud: how North Korean IT workers are scaling fraudulent employment with AI
The threat: We discovered that North Korean operatives had been using Claude to fraudulently secure and maintain remote employment positions at US Fortune 500 technology companies. This involved using our models to create elaborate false identities with convincing professional backgrounds, complete technical and coding assessments during the application process, and deliver actual technical work once hired.
These employment schemes were designed to generate profit for the North Korean regime, in defiance of international sanctions. This is a long-running operation that began before the adoption of LLMs, and has been reported by the FBI.
Implications: North Korean IT workers previously underwent years of specialized training prior to taking on remote technical work, which made the regime’s training capacity a major bottleneck. But AI has eliminated this constraint. Operators who cannot otherwise write basic code or communicate professionally in English are now able to pass technical interviews at reputable technology companies and then maintain their positions. This represents a fundamentally new phase for these employment scams.
Our response: when we discovered this activity we immediately banned the relevant accounts, and have since improved our tools for collecting, storing, and correlating the known indicators of this scam. We’ve also shared our findings with the relevant authorities, and we’ll continue to monitor for attempts to commit fraud using our services.
No-code malware: selling AI-generated ransomware-as-a-service
The threat: A cybercriminal used Claude to develop, market, and distribute several variants of ransomware, each with advanced evasion capabilities, encryption, and anti-recovery mechanisms. The ransomware packages were sold on internet forums to other cybercriminals for $400 to $1200 USD.
Implications: This actor appears to have been dependent on AI to develop functional malware. Without Claude’s assistance, they could not implement or troubleshoot core malware components, like encryption algorithms, anti-analysis techniques, or Windows internals manipulation.
Our response: We have banned the account associated with this operation, and alerted our partners. We’ve also implemented new methods for detecting malware upload, modification, and generation, to more effectively prevent the exploitation of our platform in the future.
Next steps
In each of the cases described above, the abuses we’ve uncovered have informed updates to our preventative safety measures. We have also shared details of our findings, including indicators of misuse, with third-party safety teams.
In the full report, we address a number of other malicious uses of our models, including an attempt to compromise Vietnamese telecommunications infrastructure, and the use of multiple AI agents to commit fraud. The growth of AI-enhanced fraud and cybercrime is particularly concerning to us, and we plan to prioritize further research in this area.
We’re committed to continually improving our methods for detecting and mitigating these harmful uses of our models. We hope this report helps those in industry, government, and the wider research community strengthen their own defenses against the abuse of AI systems.
Further reading
For the full report with additional case studies, see here.
- ^
I initially noticed this report from tjohnson314 in the Manifold Discord server, who linked to this NBC article about the Anthropic report
A good friend of mine works for a company called Outflank, where they basically develop “legal” malware for red teamers to use during sanctioned tests of organizations’ security. He does not have a standard ML background, but for work he recently created an RL environment for training LLMs to generate shellcode that bypasses EDR/antivirus, which they use to great success. He wrote a blog post about it here and gave a related talk at DEFCON this month.
Normies probably underestimate the significance of being able to bypass EDR very quickly and cheaply. This is very critical security control in a large organization where you expect some proportion of your workforce to download malware at regular intervals. Training small models to do these kinds of tasks is possible on a shoestring budget well within the means of black hat organizations, and I expect them to start doing this sort of thing regularly within the next ~year or so.
While the scenario is worrying, I have to say there’s something funny about North Korea being reduced to a point where a few US tech job salaries are a significant help to their national budget.
[mod note: I frontpaged this, but it’s sort of an edge case on the ‘is it timeless’ front]
I’m glad Anthropic is taking steps to address this, but they can only control their own models. Open-source coding agents and models are, what, maybe a year behind in capabilities?
Another interesting but likely impractical, cybersecurity threat: https://infosec.exchange/@ESETresearch/115095803130379945
It strikes me that Anthropic’s blog post is engaging in a bit of double-speak in saying they are “disrupting” the operations of cybercriminals.
What they are describing is retroactively taking action after crime has occurred.
it seems like the operations were ongoing, and they disrupted them.to me it appears a normal and legitimate use of the word.
From their perspective, I suppose they have met their regulatory burden, so they are proactively going beyond the mark in breaking down these operations.
It begs the question what their less safety-conscious kin are doing to prevent misuse of their tools. Meta’s recent AI policy leak shows a fairly laissez-faire attitude.
Perhaps, but they are taking action and deserve to be lauded for that, at least. I admit I’m side-eyeing the framing and the implication that there’s very much they can do to stop this occurring again in the future, but I am glad they chose to do the work and write about it.
I support more advancements in cyberhacking capabilities so that companies and govts are incapable of keeping secrets. Secrecy enables them to act against the wishes of the majority to an extent that couldn’t otherwise.
I don’t think you could realistically get to the point where you can do international policy, trade policy, military planning and R&D, etc, without secrecy. I guess however there’s always the good old analogue paper files and folders.
But also, if AI is so powerful, we’d need to see where the chips fall once people also start deploying them for defense.
It is good actually that you can’t do these things. Most people don’t support threatening nuclear war on the other empire in order to extract favourable deals for your empire.
Domestic politics will also change radically. Many govts will fall, and those that last will work differently.
My impression is that very often the threats of nuclear war are what’s public, and the backroom deals to actually avoid those are what’s secret. This seems remarkably optimistic because while nuclear war specifically might not be something anyone wants (and it’s questionable too; see how many Russian politicians seem to think that bragging about nuking the West will make them more popular and do so as part of their propaganda), other kinds of war absolutely can be. There was a lot of support for war post 9/11, for example. So if anything I fear that the opposite would happen—with everything in the open where everyone has always to worry about not losing face, and no room to secretly say “no ok I was just kidding let’s find a deal” you’d get more war, not less.
Agree
I’m not seeing the connection from no private room to more war.
You are right that leaders (and diplomats) will have to be more honest about what they actually believe to both the public and to leaders of other countries.
I think in practice people seem to reward “strength” very often and that would encourage leaders to keep acting as aggressively as many do now on the world stage, but also without the secret channels to actually defuse the resulting tensions. Hence the stuff that is said openly could end up mattering more and be taken more seriously, because it’s now the only channel.
More data backing your claim would be nice. Namely that a majority of US or Chinese population would support risking nuclear war on the other population, just to get slightly better living standards for themselves.
Also I’m impressed how pessimistic a claim this is, and I thought I was already quite pessimistic about human nature.
I don’t know about nuclear war, though I would remind you of the worrying amount of “nuclear winter wouldn’t be so bad” takes when tensions were a bit higher with Russia due to the Ukraine war. But obviously throughout history massive popular pro-war and interventionist movements have existed, even for blatantly braindead choices (see: Italy throwing itself into WW1 in 1915 after having managed to stay out, or for a more recent examples, support for the Afghanistan war after 9/11).
I don’t think nuclear war would ever happen because someone actively wants to just push a button and go full Armageddon on the other side. The risk is more that you progressively slide through various steps of escalation until one side starts feeling so existentially threatened that they start using nuclear weapons, and from there things escalate even further until a mix of panic and fog of war triggers the catastrophe. And each of those steps can at the time seem reasonable and rational, given limited knowledge. Besides, my point isn’t that people would want this necessarily out of nowhere, but that limiting secrecy does nothing to prevent politicians from still being hawkish and short-sighted and riling up their voting base with misleading or false information; it only prevents them from having an easier out later.
Okay. I agree some people genuinely want to mass murder the other side just to get slightly more resources. I just want more data that this would actually be a majority.
I think de-escalating would also be easier when people of both countries have high level of visibility into what people of the other country are feeling and why.
I think people of both countries would be able to understand psychology of people of the other country to an extent that was not really possible before in history. Simply because of how much data you have about personal lives of everyone.
Why do you put the onus on proving that there is one rule about it being a majority? We know it happens. It’s hard to say for stuff like the Nazis because technically the people only voted for some guy who was certainly very gung-ho about militarism and about the need for Germany to expand, but then the matter was basically taken out of their hands, and it was at best a plurality to begin with...
Yes, obviously there is no one case I can present to say “here’s a situation where at least 50%+1 of the population genuinely was in favour of war”. Neither can you prove that this has never happened. All we know is that some people do express favour for war; sometimes there are even mass movements in favour of it, depending on circumstances; and it would be somewhat odd if by some strange hidden law of social dynamics that fraction could never exceed 50%, despite having definitely been significant in various occasions we can refer to.
Anyway at the very least we seem to have evidence that over 50% of Israelis believe the current war in Gaza is appropriate or even not harsh enough. That’s a bit of evidence.
I don’t think that’s achieved just by “no more secrecy” though. Understanding how another country’s population feels isn’t a matter of that information being concealed, but hard to measure and aggregate.
Are you also against online banks?! Or are you somehow only hoping for advancements in hacking that breaks confidentiality but not integrity?
I’m fine with it if we end up relying a lot more on cash as a side effect. Banks will still exist like they did before the internet. But the amounts of transactions that can be authorised online may reduce.
I am sorry, but according to my current understanding, your proposal may be illegal and potentially harmful. I reported this comment for moderator review.
Law doesn’t decide morality, morality decides the law. But yes it’s upto lesswrong mods and the community if they wanna entertain me or not.
@Shankar reacting to your emote: This claim feels trivially obvious to me. If you have a counter you can bring it up.
Ofcoursw law is decided by (leaders representing) a large group of people who try to encode their morality and their conflict resolution processes into something more formal.
Yes you can nitpick on details but the broad overview is this.
A country with very different morality will have very different laws (such as an Islamic state having different laws from a western one)
The end result appears to be that the bad actor relying on the AI system created a chokepoint that got their operations discovered and stopped.
Thus, despite ‘this would have taken years of training’ rhetoric, the ‘AI enabled’ future of crime seems more amenable to the needs and desires of the State than the regime it is replacing.
i doubt this very much, one of the most consistent trends we see is that once a capability is available in any model, the cost of inference and training an open source model goes down over time.