Lasso Adds Automated Red Teaming Capability to Test LLMs
Lasso today added an ability to autonomously simulate real-world cyberattacks against large language models (LLMs) to enable organizations to improve the security of artificial intelligence (AI) applications.
The Red Teaming capability extends a Lasso platform that detects anomalous usage of large language models (LLMs) that violate the security and compliance policies an organization has defined.
Cybersecurity teams can now additionally use the platform to launch simulated cyberattacks using a set of AI agents that Lasso has trained using attack vectors it has previously identified.
Ophir Dror, chief product officer for Lasso, said that capability will make it simpler for cybersecurity teams that typically lack LLM expertise, to test how easy it might be to prompt an LLM into generating sensitive data or inaccurate outputs
For example, Lasso recently conducted an in-depth security assessment of Llama 3.2 and Chinese model DeepSeek R1 LLMs. Llama 3.2 demonstrated robust protection against unauthorized use of intellectual property and data leakage. However, it showed notable weaknesses in safeguarding against hallucinations, potentially illegal and criminal activity, and defamatory statements.
DeepSeek, meanwhile, implemented robust restrictions exclusively for topics related to China, while leaving virtually all other content domains unprotected. It lacked meaningful guardrails for critical security concerns, including data leak protection and safeguards against AI hallucination and misinformation.
As more LLMs are employed within organizations, responsibility for securing them is increasingly falling to cybersecurity teams. The Red Teaming capability provides a means to automate the testing of LLMs, which cybersecurity teams can customize as they gain more experience, noted Dror.
The AI agents deployed by Lasso are also designed to continuously discover new attack techniques, creating an ever-evolving repository of threats that can be used to improve simulations. Reports based on those attacks are generated using model cards that contain all detected issues and vulnerabilities categorized by type, along with optimization recommendations and remediation guidance.
Additionally, the platform provides assessments of weaknesses in system prompts, automatically enhances them, and generates guardrails to reduce the manual effort that would otherwise be required as part of an effort to both eliminate data leakage and the inclusion of inaccurate data generated by an LLM within a data set. That’s critical because while many organizations are concerned about leaking sensitive data to an LLM, not as many appreciate the fact that inaccurate outputs are also being incorporated into enterprise applications, noted Dror.
It’s still early days so far as the incorporation of LLMs is concerned, but usage is already widespread. Rather than trying to ban usage of LLMs altogether in an era where end users can easily end run restrictions, organizations would be better advised to vet which LLMs are being made available within the context of a set of well-defined policies, said Dror.
Regardless of approach, it’s now more a question of to what degree the rise of LLMs in the enterprise will create security and compliance issues. The issue now is not so much completely preventing them from happening at all, so much as it is limiting the number of incidents that are about to inevitably occur.