Home » Cybersecurity » Analytics & Intelligence » LLM vector and embedding risks and how to defend against them

LLM vector and embedding risks and how to defend against them

by Aaron Linskens on June 12, 2025

As large language model (LLM) applications mature, the line between model performance and model vulnerability continues to blur.

While vector embeddings have become foundational to Retrieval-Augmented Generation (RAG), recommendation systems, and semantic search, their improper handling introduces new attack surfaces that can compromise both LLM behavior and user data.

In this third post in our blog series exploring the Open Worldwide Application Security Project (OWASP) Top 10 for Large Language Model Applications, we focus on “Vector and Embedding Weaknesses” — a risk category that highlights how subtle manipulation of vector space can lead to data poisoning, behavior modification, and data leakage.

What Are Vector and Embedding Weaknesses?

Vector embeddings are mathematical representations of concepts, allowing LLMs to reason about similarity and relevance. These are typically generated from user inputs or external documents, then matched against a vector store to augment responses — a technique central to active RAG.

However, these embeddings are vulnerable:

Malicious inputs can be crafted to poison the embedding space, misleading LLMs into returning incorrect or adversarial results.
Attackers may insert embedding collisions, where crafted text shares near-identical vector values with legitimate content.
Poor hygiene in vector storage or indexing can lead to data exposure, especially when embeddings encode sensitive information.

In short, embedding vulnerabilities undermines the trustworthiness of the retrieval pipeline itself, which RAG relies on to ground LLM responses in factual data.

Real-World Risks: From Semantic Poisoning to Data Leaks

The OWASP LLM Top 10 highlights several real-world examples of how vector and embedding weaknesses can manifest:

Hidden instructions in embedded content: Attackers can insert invisible prompts, such as white text on white backgrounds, into documents submitted to systems powered by RAG. When these documents are embedded and later retrieved, the hidden text can manipulate the LLM into producing biased (Read more...)

*** This is a Security Bloggers Network syndicated blog from 2024 Sonatype Blog authored by Aaron Linskens. Read the original post at: https://www.sonatype.com/blog/llm-vector-and-embedding-risks-and-how-to-defend-against-them

June 12, 2025April 14, 2026 Aaron Linskens Artificial Intelligence, data, LLMs, owasp, OWASP Top 10

LLM vector and embedding risks and how to defend against them

What Are Vector and Embedding Weaknesses?

Real-World Risks: From Semantic Poisoning to Data Leaks

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Randall Munroe’s XKCD ‘Types of Board Game’