Independent cybersecurity researchers pushed back on a report by Anthropic that claimed hackers had used its Claude Code agentic coding system to perpetrate an unprecedented automated cyberattack.
What’s new: In a blog post, Anthropic described thwarting a September campaign by hackers sponsored by the government of China, calling it the “first documented case of a large-scale cyberattack without substantial human intervention.” However, some independent researchers said that current agents are not capable of performing such nefarious feats, Ars Technica reported. Moreover, the success rate — a few successful attacks among dozens — belie Anthropic’s claim that the agentic exploit revealed newly dangerous capabilities. The lack of detail in Anthropic’s publications makes it difficult to fully evaluate the company’s claims.
Claude exploited: The hackers circumvented Claude Code’s guardrails by role-playing as employees of a security company who were testing its networks, according to Anthropic’s report.
- They coaxed Claude Code to probe, breach, and extract data from networks in small steps that the underlying model didn’t recognize as malicious, Then it executed them at speeds beyond the reach of conventional hacks.
- Agentic AI performed 80 percent to 90 percent of the technical steps involved, and human intervention was required only to enter occasional commands like “yes, continue,” “don’t continue,” or “Oh, that doesn’t look right, Claude, are you sure?” The Wall Street Journal reported.
- The intruders targeted at least 30 organizations and succeeded in stealing sensitive information from several.
- The report didn’t identify the organizations attacked, explain how it detected the attacks, or explain how it associated the attackers with China. A spokesman for China’s Foreign Ministry said that China does not support hacking, The New York Times reported.
Reasons for skepticism: Independent security researchers interviewed by Ars Technica , The Guardian, and others found a variety of reasons to question the report.
- While they agreed that AI can accelerate tasks such as log analysis and reverse engineering, they have found that AI agents are not yet capable of performing multi-step tasks without human input, and they don’t automate cyberattacks significantly more effectively than hacking tools that have been available for decades. “The threat actors aren't inventing something new here,” researcher Kevin Beaumont said in an online security forum.
- In addition to Claude Code, the hackers used common open-source tools, Anthropic said. Yet defenses against these familiar tools are also familiar to security experts, and it’s not clear how Claude Code would have changed this.
- Anthropic itself pointed out that Claude Code may well have hallucinated the information it purportedly hacked, since it “frequently overstated findings” and “occasionally fabricated data.” Such misbehavior is a significant barrier to using the system to execute cyberattacks, the company said.
Behind the news: Hackers routinely use AI to expedite or automate their work, for instance writing more effective phishing emails or generating malicious code. In August, Anthropic highlighted the rise of “vibe hacking,” in which bad actors who have limited technical skills use AI to pursue nefarious activities previously undertaken only by more highly skilled coders. In August, Anthropic reported that it had disrupted one such effort, which involved the theft of personal data and extortion. In October, White House AI Czar David Sacks accused Anthropic of running a “sophisticated regulatory capture strategy based on fear-mongering.”
Why it matters: It stands to reason that AI can make hacking faster and more effective, just as it does many everyday activities. But Anthropic’s description of the Claude-powered agentic cyberattack it discovered is at odds with the experience of security researchers outside the company. Independent researchers have found agents relatively ineffective for automating cyberattacks and conventional methods equally or more dangerous. Security researchers are right to explore agentic AI both to perpetrate and defend against security threats, but it has not yet been found to pose the dire threat that Anthropic warns of.
We’re thinking: AI companies want to promote the power of their products, and sometimes — paradoxically — that promotion emphasizes a product’s powerful contribution to a negative outcome. Positive or negative, hype is harmful. We hope that makers of state-of-the-art models and applications based on them will find ways to drum up interest in their accomplishments — many of which are genuinely impressive and exciting! — without misleading or confusing the public. With respect to cybersecurity, AI-driven detection of security flaws makes it easier to patch them. In this way, AI helps to shift the balance of power from attackers to defenders, making computers more secure, not less.