Prompt Injection Examples That Expose Real AI Security Risks

The Lasso Team

January 21, 2026

min read

Prompt Injection Examples That Expose Real AI Security Risks

What is Prompt Injection?

‍

Prompt injection describes a class of attacks that exploit how language models interpret and prioritize instructions within their context. What began as a way to bypass chatbot rules has developed into a broader risk that affects applications relying on retrieval, memory, automation, and tool execution.

‍

As prompts now include user input, external documents, system logic, and accumulated context, models are increasingly required to infer intent rather than enforce it. Prompt injection takes advantage of that ambiguity. By embedding instructions where models expect data, or by reshaping context over time, attackers can influence behavior without triggering traditional security controls.

‍

Understanding prompt injection today means looking beyond individual prompts and focusing on how authority flows through modern GenAI architectures.

‍

In this article, you’ll learn:

How prompt injection works across chatbots, retrieval pipelines, and agentic workflows
Why many real-world examples evade detection despite existing guardrails
Where prompt injection risk emerges as models gain access to data and tools
How production-grade controls can detect and contain prompt injection at runtime

‍

Types of Prompt Injection Attacks
‍

Attack type	How it works	How critical it is
Direct prompt injection	An attacker explicitly inserts instructions into user input to override system or developer prompts (“ignore previous instructions”).	Still common, but increasingly used as a setup step for more advanced attacks against agents and tools rather than as a standalone exploit.
Indirect prompt injection	An attacker embeds malicious instructions in external content the model consumes: documents, web pages, emails, tickets, or database entries.	Dominant risk for RAG-based systems, where retrieved data is implicitly trusted and blended into the model’s reasoning context.
Stored prompt injection	Malicious prompts are persistently stored in memory, logs, knowledge bases, or vector stores and triggered later by unrelated user queries.	Especially dangerous in long-lived enterprise systems where prompts outlast sessions and bypass traditional perimeter defenses.
Tool/function hijacking	Injected prompts manipulate how the model invokes tools, APIs, plugins, or downstream systems (e.g., executing unintended actions).	Critical risk as LLMs gain operational agency Turns prompt injection into a business logic and automation threat
Multimodal prompt injection	Instructions are embedded in images, audio, PDFs, or other non-text inputs processed by multimodal models.	Expands the attack surface beyond text, complicating detection Blurs the boundary between content and control
Agent memory manipulation	Prompts are crafted to alter an agent’s long-term memory, goals, or decision heuristics over time.	A defining risk for agentic AI Compromise doesn’t require immediate exploitation, only gradual influence

‍

Real-World Prompt Injection Examples in Production AI

Prompt Injection in Customer-Facing Chatbots

‍

One of the earliest widely reported prompt injection incidents occurred with Microsoft’s AI-powered Bing Chat (codename “Sydney”), where a Stanford student used a crafted prompt to make the system reveal its hidden system instructions and internal directives. The attack didn’t require special privileges. It simply instructed the model to “ignore previous instructions”. That was enough to surface information that was meant to remain hidden from users.

‍

Prompt Injection via RAG and Embedded Content

‍

Indirect prompt injection has been demonstrated in AI assistants that process external content such as documents or emails. Security research and incident analyses have shown that when hidden instructions are embedded in files or other ingested content, models can be induced to follow those instructions as part of their reasoning process. When this happens, attackers can influence behavior and compromise system integrity without any direct user manipulation.

‍

Prompt Injection in Agentic and Tool-Calling AI

‍

Recent disclosures involving AI-assisted developer tools show how prompt injection escalates once models are allowed to execute actions. In one documented case, attacker-controlled content influenced an autonomous coding agent and led to the execution of commands on the user’s system.

‍

The vulnerability arose because the agent ingested untrusted external data through connected tools and treated it as valid operational context. Malicious instructions embedded in that data were able to redirect the agent’s control flow and trigger actions with developer-level privileges, without explicit user approval at the time of execution.

‍

Indirect Prompt Injection Through Third-Party Inputs

‍

One of the most compelling documented production exploits is EchoLeak, a zero-click prompt injection exploit in Microsoft 365 Copilot (CVE-2025-32711) that allowed remote, unauthenticated data exfiltration through crafted emails. The attack bypassed many defenses by leveraging content the system automatically processed, showing how prompt injection can be introduced without direct user interaction.

‍

How Prompt Injection Attacks Work in Practice

‍

In real-world systems, prompt injection rarely appears as a single, obvious failure. Instead, it exploits how LLMs resolve conflicting instructions, collapse trust boundaries, and act on outputs as if they were decisions.

‍

The table below outlines the most common execution patterns seen in production environments, along with real-world examples that illustrate how these attacks unfold.

‍

Mechanism	What happens at runtime	How to mitigate
Instruction Override and Priority Conflicts	The model receives competing instructions (system, developer, user, retrieved content) and resolves them incorrectly, allowing untrusted input to override higher-priority intent.	Enforce strict instruction hierarchies at runtime Separate system intent from user and retrieved input Prevent untrusted content from altering control logic (even when phrased as valid instructions)
Trust Boundary Violations Between Data and Prompts	Content intended as passive data (documents, emails, web pages) is interpreted as executable instruction once injected into the model’s context window.	Preserve trust boundaries across ingestion, retrieval, and generation by isolating reference data from instruction channels and validating context before it influences model behavior.
Manipulating Model Responses and Tool Calls	The attacker steers the model’s output so that it triggers unauthorized tool use, API calls, or workflow steps downstream.	Constrain downstream actions through policy enforcement, output validation, and explicit authorization gates before any tool invocation, state change, or data access occurs.

‍

Where Prompt Injection Risk Emerges Across AI Architectures

‍

Prompt injection emerges wherever instructions, data, and decisions intersect inside an AI model. The number of places where intent can be redirected increases as architectures become more modular, spanning APIs, retrieval layers, workflows, and agents.

‍

APIs and Application-Level Prompts

‍

At the application layer, prompt injection often enters through overloaded prompts: single instruction streams that combine system logic, user input, and business rules. When APIs pass these prompts downstream without clear boundaries, models have to infer intent rather than enforce it.

‍

This risk is amplified when:

System prompts embed operational logic or sensitive assumptions.
Application prompts evolve rapidly without security review.
API consumers can influence prompt structure indirectly through parameters or metadata.

‍

In these environments, prompt injection doesn’t even require obvious malicious input. Small shifts in phrasing or structure can be enough to alter behavior.

‍

External Data Sources and Knowledge Bases

‍

Retrieval-augmented architectures introduce a different class of risk: implicit trust in external content. Documents, tickets, emails, web pages, and vector databases should be treated as reference material. But LLMs can process them as instructions unless explicitly constrained.

‍

Prompt injection emerges when:

Retrieved content is blended directly into the model’s reasoning context.
Knowledge bases accumulate unvetted or user-generated data over time.
Stored prompts persist across sessions and users.
Data provenance and trust boundaries are not preserved as content moves through ingestion, embedding, retrieval, and generation layers.

‍

Because the model cannot distinguish “data to summarize” from “instructions to follow,” attacks at this layer are both subtle and persistent.

‍

Multi-Step AI Workflows and Autonomous Agents

‍

The highest-risk environments are multi-step workflows and autonomous agents. Here, prompt injection becomes a control problem.

‍

Risk surfaces when agents carry memory forward across tasks without validation, or automatically invoke tools without oversight.

‍

In these workflows, prompt injection doesn’t need to succeed immediately. Gradual influence is often enough to redirect behavior in ways that are hard to detect and reverse.

‍

Why Prompt Injection Attacks Are Hard to Detect

‍

Most prompt injection attacks don’t look like attacks. They exploit normal system behavior and trusted data flows across AI stacks. As LLMs move beyond single prompts into retrieval, memory, tools, and agents, the signals security teams rely on become weaker.

‍

The table below outlines the core reasons prompt injection continues to evade detection in real-world deployments.

‍

Detection challenge	What’s actually happening	Why traditional controls miss it
Hidden Instructions Embedded in Legitimate Content	Attackers bury malicious instructions inside documents, emails, tickets, web pages, or database entries that the model should be able to trust and process.	Content looks benign at ingestion time. Security controls rarely distinguish data that’s meant for summarization from instructions for execution.
Multi-Turn and Context-Dependent Attacks	The attack unfolds gradually across multiple interactions, relying on prior context, memory, or accumulated reasoning rather than a single malicious prompt.	Detection systems that excel at single-prompt analysis miss attacks that only become malicious in aggregate or over time.
Lack of End-to-End Prompt and Response Visibility	Inputs, retrieved context, system prompts, tool calls, and outputs are often handled by different components and teams.	Without a single view of the full request-to-response chain, it’s difficult to spot where intent shifted or authority misuse took place.

‍

Prompt injection is hard to detect because LLMs collapse data, instructions, and intent into the same execution path. That architectural reality makes perimeter-style defenses increasingly ineffective.

‍

Best Practices for Testing and Preventing Prompt Injection

‍

There is no single control that “fixes” prompt injection. Effective prevention comes from treating intent, data, and authority as separate concerns. The most resilient defenses combine testing, runtime controls, and observability across the full GenAI lifecycle.

‍

Input Validation and Context Isolation

‍

Prompt injection succeeds when untrusted input is allowed to influence behavior. The core defense is context isolation.

‍

In practice, this means:

Clearly separating user input, retrieved content, system instructions, and memory so they cannot implicitly modify one another.
Scoping or resetting context between interactions to prevent gradual, accumulation-based manipulation.

‍

Testing should mirror real usage. Indirect and stored prompt injection scenarios (especially in RAG pipelines and long-lived sessions) surface weaknesses that static input checks will never catch.

‍

Output Guardrails and Action Constraints

‍

Most high-impact failures occur after the model responds, when downstream systems treat outputs as authoritative. This is why guardrails should focus on limiting consequences, not just filtering language.

‍

Effective controls include:

Enforcing strict output schemas and response contracts.
Blocking responses that attempt to redefine roles, override policies, or escalate privileges.
Requiring explicit verification before any tool execution, state change, or sensitive data access.
Treating model outputs as advisory by default, not executable instructions.

‍

This reframes prompt injection from a content problem into an authority and control problem.

‍

Continuous Monitoring in Production AI

‍

It’s not possible to fully prevent prompt injection at design time. Models evolve, and attacker techniques change. That makes runtime monitoring a baseline requirement.

‍

At a minimum, monitoring should provide:

End-to-end visibility across prompts, retrieved context, system instructions, tool calls, and outputs.
The ability to detect anomalous reasoning paths, unexpected tool usage, or behavior drift over time.
A complete audit trail that allows teams to reconstruct how and why a model took a decision.

‍

How Lasso Detects and Controls Prompt Injection

‍

Prompt injection rarely announces itself at the point of input. It emerges through interaction: when prompts, retrieved context, model behavior, and downstream actions combine in ways that shift intent or authority. Lasso is designed to operate at this interaction layer, where those shifts become visible and enforceable.

‍

Detection begins with runtime awareness. Lasso monitors how user input, external content, system instructions, and model outputs are combined during execution, rather than evaluating them in isolation. This makes it possible to identify when untrusted context begins to influence restricted behavior, such as accessing sensitive data, invoking tools, or altering workflow logic.

‍

Control is applied through policy enforcement that reflects how applications actually use language models in production. Instead of relying on static prompt hardening, Lasso enforces boundaries around what actions are allowed in a given context, who or what can trigger them, and under which conditions.

‍

By coupling visibility with enforcement at runtime, Lasso enables teams to manage prompt injection as an operational risk. This approach supports auditing and continuous improvement as applications evolve and usage patterns change.

‍

Conclusion

‍

Prompt injection has moved beyond edge cases and academic demonstrations. As language models are embedded deeper into workflows, connected to external data, and trusted to take action, small shifts in intent can carry real consequences.

‍

The examples explored in this article show that prompt injection succeeds when boundaries blur—between data and instructions, reasoning and execution, assistance and authority. Addressing this risk requires visibility into how behavior unfolds in practice and controls that apply where decisions are made.

‍

For organizations adopting LLMs at scale, the challenge is recognizing where it appears in real deployments and ensuring that safeguards operate continuously, as part of normal operation.

Learn more