OWASP Top 10 for LLMs: AI Security Risks Explained

Key insights

Prompt injection remains the #1 risk because AI models still struggle to tell the difference between instructions and user input
Sensitive information disclosure jumped four spots to #2. Training data can leak out, and attackers can harvest entire models through repeated queries
Every AI system has a supply chain of data, models, and infrastructure. With over 2 million models on Hugging Face, manual inspection is impossible
Defenses follow a pattern: filter inputs and outputs with an AI firewall, limit access, and test your system like an attacker would

SourceYouTube

Published March 7, 2026

IBM Technology

Hosts:Jeff Crume

This is an AI-generated summary. The source video may include demos, visuals and additional context.

Watch the video · How the articles are generated

In Brief

Jeff Crume, Distinguished Engineer at IBM, walks through OWASP's updated Top 10 security risks for Large Language Models (LLMs), the AI systems behind tools like ChatGPT and Claude. The 2025 list reflects what teams have learned since the first version in 2023. Prompt injection is still the number one threat, sensitive data leaks have become a much bigger problem, and new risks like "denial of wallet" show how AI attacks can directly cost you money.

What you'll learn

The 10 most common ways attackers target AI systems, ranked by real-world impact
Why prompt injection is so hard to fix, and how indirect attacks work through documents
Practical defenses you can apply today: AI firewalls, access controls, and pen testing

The big four: risks that matter most

The first four items on the list account for the most damaging real-world attacks. Here's what each one means and how to defend against it.

1. Prompt injection (unchanged from 2023)

This is still the number one risk because AI models struggle to tell the difference between instructions and user input. An attacker can type something that overrides the system's built-in rules.

There are two types. Direct injection is when an attacker types a malicious prompt straight into the system. The classic example: asking a chemistry question that tricks the AI into explaining how to make something dangerous.

Indirect injection is sneakier. A user asks the AI to summarize a document, but the document itself contains hidden instructions like "forget all previous rules". The AI follows those instructions without the user even knowing.

Researchers have found that rephrasing prompts as poems or Morse code can bypass protections that work against normal language. Defenses: Strengthen system prompts, put an AI firewall (a filter that checks inputs and outputs) between users and the model, and run penetration tests (deliberate attacks on your own system to find weaknesses).

2. Sensitive information disclosure (up 4 spots)

This has become a much bigger problem than expected. If an AI was trained on customer data, health records, or financial information, a clever prompt can make it leak that data back out.

There's also something called a model inversion attack. An attacker sends thousands of queries and records the responses, gradually extracting the model's training data. It's like photocopying a book one page at a time.

Defenses: Filter sensitive data before it enters the model, use an AI firewall on outputs to catch leaking credit card numbers or personal information, and lock down who can access the model.

3. Supply chain vulnerabilities (up 2 spots)

An AI system doesn't exist in a vacuum. It needs data, a base model, applications, and infrastructure. Most organizations don't build their own models. They download them from places like Hugging Face, which has over 2 million AI models, many with more than a billion parameters. That's far too much for anyone to inspect manually. Defenses: Verify your sources, trace provenance (where things came from and who touched them along the way), scan models for vulnerabilities, and keep all software up to date.

4. Data and model poisoning (down 1 spot)

If the data used to train an AI contains errors or has been tampered with, everything downstream is affected. As Crume puts it: "just a little bit of toxin in the drinking water makes us all sick".

This also applies to retrieval-augmented generation (RAG), a technique where you give the AI specific documents to reason over. If those documents have been manipulated, the AI's answers will be wrong too.

Defenses: Know where your data comes from, control who can change models and training data, and put change management in place.

The remaining six

The video covers these more briefly, but each one matters.

5. Improper output handling. If your AI writes code or generates content that goes into a browser, it could introduce vulnerabilities like cross-site scripting (XSS) or SQL injection. Never trust AI output blindly.

6. Excessive agency. An AI connected to tools, APIs, and real-world systems has real power. If it gets hijacked or hallucinates, it could take actions with serious consequences.

7. System prompt leakage. The system prompt sets the AI's rules. If it contains credentials or API keys, a carefully worded question could make the AI reveal them.

8. Vector and embedding weaknesses. Manipulated RAG documents can corrupt the AI's knowledge over time, making it unreliable.

9. Misinformation. AI models hallucinate. They make up facts that sound convincing. Critical thinking and cross-referencing are essential.

10. Unbounded consumption. Overloading an AI system with too many or too complex requests can take it offline. This is known as "denial of wallet" because it costs real money.

Practical implications

For developers building AI features

Start with the top four. If you're using an LLM in production, make sure you have input/output filtering and access controls. These cover the biggest risks.

For teams evaluating AI tools

Ask your vendors about their security posture. Do they filter prompts? How do they handle sensitive data in training? What supply chain verification do they do?

For anyone using AI daily

Be a critical consumer. AI can hallucinate, leak information, and be manipulated. Cross-check important answers against other sources.

Glossary

Term	Definition
OWASP	Open Worldwide Application Security Project. A nonprofit that publishes practical security guidance, including the well-known Top 10 lists.
Prompt injection	Tricking an AI by sneaking instructions into a prompt that override its built-in rules.
System prompt	Hidden instructions that set the AI's behavior, like "be helpful" or "don't share personal data."
AI firewall / AI gateway	A filter between users and the AI that checks both inputs and outputs for suspicious content.
Model inversion attack	Extracting an AI's training data by sending thousands of queries and piecing the responses together.
RAG (Retrieval-Augmented Generation)	Giving an AI specific documents to reason over, reducing made-up answers. The documents are retrieved and used to "augment" the AI's response.
Supply chain (AI)	Everything that goes into building an AI system: training data, base models, software, and infrastructure.
Data poisoning	Introducing bad data into AI training to corrupt the model's outputs. Subtle changes can go undetected for a long time.
Excessive agency	When an AI has too many permissions and can take real-world actions, like calling APIs or modifying systems.
Denial of wallet	An attack that overloads an AI system, making it unavailable and wasting the money spent running it.
Penetration testing	Deliberately attacking your own system to find weaknesses before real attackers do.
Hallucination	When an AI confidently generates false information. It's not lying on purpose. It's predicting what sounds right and getting it wrong.
Hugging Face	A platform for sharing AI models, like GitHub for machine learning. Hosts over 2 million models.