Anthropic Outlines How Bad Actors Abuse Its Claude AI Models
Anthropic is the latest AI company to detail how bad actors are abusing its large language models (LLMs) and bypassing protections to further their malicious operations.
In a report this month, the firm outlined a number of instances where hackers abused Claude, its family of LLMs, for everything from running an “influence-as-a-service” operation to exposing credentials from internet-connected security cameras to recruitment scams in Europe.
In the report, Anthropic researchers wrote that the use cases outlined “are representative of broader patterns we’re observing across our monitoring systems. These examples were selected because they clearly illustrate emerging trends in how malicious actors are adapting to and leveraging frontier AI models.”
The goal is to put these instances into the public sphere to give the industry a better understanding of how Claude and other LLMs are being exploited by threat groups and the need to develop more protections for AI systems, they wrote.
Guarding Against Persistent Threats
Almost immediately after OpenAI introduced ChatGPT to the world more than two years ago, bad actors have looked to exploit generative AI LLMs for their own malicious purposes.
OpenAI and Meta last year in separate instances disrupted multiple nation-state operations out of Russia, China, and other countries, with both saying the goal was to sway public opinion and effect political outcomes in the United States and elsewhere.
“Overall, these trends reveal a threat landscape marked by evolution, not revolution,” OpenAI researchers wrote in a report last year about covert operations exploiting OpenAI tools. “Threat actors are using our platform to improve their content and work more efficiently.”
Meta, parent company of Facebook and Instagram, in its own report last year talked about other threat groups using AI for influence campaigns on both of its social medial platforms, including 510 Facebook and 32 Instagram accounts linked to STOIC, which is out of Israel.
Microsoft in January sued 10 members of a foreign-based threat group it’s designated as Storm-2139 that the cloud giant claimed were using its AI tools in a “sophisticated scheme” to bypass protections its has been around its Azure OpenAI services.
Influence-as-a-Service
In their report, Anthropic analysts said the most unique abuse of Claude was at the hands of a threat group using the AI platform to automate operations that it used to reach tens of thousands of authentic social media accounts in many countries and languages.
The group’s infrastructure used Claude to build and orchestrate more than 100 social media bot accounts to push its clients’ political perspectives.
“These political narratives are consistent with what we expect from state affiliated campaigns, however we have not confirmed this attribution,” they wrote. “Most significantly, the operation utilized Claude to make tactical engagement decisions such as determining whether social media bot accounts should like, share, comment on, or ignore specific posts created by other accounts based on political objectives aligned with their clients’ interests.”
Credential Stuffing, Recruitment Fraud
Another bad actor used Anthropic’s technology to develop tools to scrape leaked passwords and usernames linked to security cameras and to access the devices. The hacker showed sophisticated development skills and created an infrastructure that integrated multiple sources like breach data platforms and private stealer log environments.
“The potential consequences of this group’s activities include credential compromise, unauthorized access to IoT devices (particularly security cameras), and network penetration,” the researchers wrote, adding there’s not evidence that the effort was successful before Anthropic banned the account.
In another instance, an account holder ran a recruitment scam primarily in Eastern Europe, including pretending to be hiring managers from legitimate organizations.
“This campaign demonstrates how threat actors are using AI for real-time language sanitization to make their scams more convincing,” the analysts wrote. “The operators would submit poorly written text in non-native English and ask Claude to adjust the text as if written by a native english speaker – effectively laundering their communications to appear more polished.”
There also was a novice actor using Claude to improve their skills and develop malicious tools that they couldn’t have otherwise without the AI model. It worked, the analysts said. The relatively unskilled cybercriminal learned how to create doxing and remote access tools and grew their open source toolkit from having basic functions to more advanced capabilities like facial recognition and dark web scanning.