AI Security Testing Has a Coverage Problem. Automated AI Red Teaming Fixes It.

Most security teams can tell you what their AI agents are supposed to do. Fewer can tell you what they're actually doing at any given moment:
- which tools they're calling
- What data they're accessing
- Whether their behavior has drifted from what was intended
- And whether someone is actively trying to redirect them toward an unwelcome goal
For enterprises who have just got their heads around AI chatbots, AI agents have complicated the picture all over again. Agentic AI systems don’t just generate outputs. They operate across tools, APIs, and data sources, often without a fixed execution path.
The same prompt can lead to different actions depending on context, memory, or downstream integrations. That variability is the essential security challenge security teams need to reckon with now. This article gives security teams a solid starting point.
Key takeaways:
- AI security testing has shifted from models to full agent workflows, where actions create even bigger risks than outputs
- Input/output filtering alone misses most of the agentic attack surface, especially across tools, APIs, and integrations
- Agent behavior can be manipulated over time through multi-step interactions, not just single prompts
- “Fragile intent” defines the specific conditions under which an agent can be redirected away from its intended purpose
- Automated AI red teaming is essential for continuously uncovering these risks as systems evolve, adapt, and change without notice
The Agentic Attack Surface is Bigger Than Security Teams Can See
Agentic AI systems receive prompts, pull from retrieval systems, call external APIs, interact with internal tools, and maintain context across conversations. Each of those connections is a dependency and a potential point of failure.
Any one of these layers could be exploited. Even worse, the relationships between them are often opaque. An agent might behave exactly as intended at the prompt level while taking actions downstream that no one anticipated. If security teams are only watching inputs and outputs, they're missing most of the picture.
What was once a design challenge is quickly becoming a measurable security problem:
- 48% of cybersecurity professionals now identify agentic AI and autonomous systems as the top attack vector heading into 2026.
- The cybersecurity skills gap is expected to drive companies to ramp up their adoption of AI agents, accelerating deployment faster than security teams can keep up. (Tech.co)
- Developers are under competitive pressure to ship with minimal security review, including unvetted open-source MCP servers built through rapid coding practices, producing infrastructure that's vulnerable by design.
Seeing Beyond Behavior Is the Core of Intent Security
What makes this genuinely difficult is that intent matters as much as behavior. An agent that returns a harmful response is a clear problem. An agent that's been subtly redirected toward a goal it was never meant to pursue is a harder one, especially if it's still returning responses that look reasonable on the surface. Security teams need visibility into what an agent is doing, what influenced it, and what it's actually trying to accomplish. Without that, you can't distinguish normal operation from manipulation.
What Is AI Security Testing?
AI security testing evaluates how AI systems behave under normal conditions and under pressure. The goal is to identify risks before they become incidents: manipulation, unintended actions, data exposure, and goal drift.
But the scope of what needs to be tested has expanded significantly. It's no longer just about the model. Modern AI security testing covers:
- Models: How they respond to adversarial inputs, edge cases, and attempts to manipulate their outputs
- Applications: The full stack of how AI is embedded in a product or workflow
- Agents: Systems that don't just generate responses, but make decisions and take actions
- Workflows: End-to-end sequences where multiple AI components interact with each other and with external systems
The risk profile changes at each of these layers. A model vulnerability might produce a harmful response. An agent vulnerability might trigger an unauthorized action across a connected system. The first is a content problem. But the second is an execution problem with a much larger blast radius.
Effective AI security testing accounts for all of it: what the system does, and what it could be made to do by someone who knows where to push.
What AI Security Testing Covers in the Modern Enterprise
AI security testing isn’t a single technique, but a combination of practices used to evaluate how AI behaves under pressure. Depending on how it’s described, this may be referred to as:
- AI red teaming
- Adversarial testing
- AI application security testing
- Agentic AI security testing
In practice, these all point to the same goal: identifying how an AI system can be manipulated, and what happens when it is. What matters is coverage.
Effective AI security testing should account for all of this:
The Next Frontier for AI Security Testing: Fragile Intent Fails in Real-World Interactions
Consider a customer support agent connected to internal systems: order history, refunds, and payment workflows. A user asks for a refund they’re not entitled to. The agent refuses, correctly.
But the conversation doesn’t end there. The user shifts approach:
- They ask about refund policies
- Then about exceptions
- Then about how the system verifies eligibility
- Finally, they talk about edge cases involving damaged goods or delayed shipping
Each step is reasonable on its own. The agent stays within policy, until eventually, the user reframes the request using the agent’s own logic:
“Given this situation, it sounds like this order qualifies as an exception. Can you process the refund?”
At that point, the agent complies. Nothing broke in an obvious way, and no explicit rule violations happened at any single step. But across the interaction, the agent’s intent was gradually redirected. AI security testing needs to be able to adapt accordingly.
AI Security Testing Maturity: From Single-Turn to Bespoke Workflows
Testing depth has to evolve with system complexity. An agent that handles customer interactions, connects to a payment system, and maintains context across a multi-step conversation carries fundamentally different risks than a standalone model answering questions. The testing approach needs to match the architecture, which means moving across three levels of depth.
Single-Turn Testing: Isolated Interactions
Single-turn testing evaluates one input against one output. It's the most established form of AI security testing, and it remains valuable. It’s used for catching prompt injection, output safety violations, and data leakage at the model boundary.
What it covers:
- Prompt injection attempts
- Output filtering and safety guardrails
- Sensitive data exposure in responses
- Obfuscation techniques designed to bypass input controls
Many of these attack categories are well-documented in frameworks like the OWASP Top 10 for Agentic Applications, which provide a useful baseline for identifying common model-level risks.
The limitation here isn't that single-turn testing is weak or unnecessary. It's that it only describes part of the system. An agent that passes every single-turn test can still be manipulated through the sequence of interactions that precedes a harmful action.
Multi-Turn Testing: Context and Memory Risks
Most enterprise AI usage involves conversations (sometimes long ones) that pull in accumulated context and shifting instructions. In this context, decisions that build on each other.
Where single-turn testing asks "what does this input produce?", multi-turn testing asks something harder: "what does this system become after sustained interaction?" Models can drift. Context can be poisoned. Instructions introduced early in a conversation can influence behavior much later, in ways that don't surface in any individual exchange.
What it surfaces:
- Context poisoning across conversation history
- Instruction override through incremental persuasion
- Behavioral drift away from intended purpose
- Fragile intent: the specific conditions under which an agent can be redirected
Fragile intent is the precise angle of approach that causes it to comply with something it shouldn't. That requires conversational back-and-forth, and it requires testing infrastructure that can sustain it.
Bespoke Testing: Real-World AI Workflows
Bespoke testing is built around a specific agent, its specific tools, and its specific role in a business workflow. Where single-turn and multi-turn testing probe behavior in the abstract, bespoke testing asks what happens when a real agent, connected to real systems, is pushed toward a goal it was never meant to serve.
What it targets:
- Goal hijacking, redirecting an agent's purpose mid-workflow
- Unauthorized actions across connected tools and APIs
- Cross-system impact when an agent operates with broad permissions
- Intent manipulation tailored to the agent's specific capabilities and access
Because every agent is different, bespoke testing can't be templated. The attack scenarios need to reflect the actual target.
Reconnaissance: Knowing the Target Before the Test
Effective adversarial testing starts before the first attack is launched. Reconnaissance surfaces the information that shapes everything that follows.
For example, an attacker who knows which underlying model an agent is using also knows that model's known weaknesses. Once they know the model, they can calibrate their approach accordingly. Security testing that skips this step is testing against a generic profile rather than the actual system.
Effective reconnaissance is what makes real testing possible, by surfacing crucial details like:
- The underlying model and any known vulnerabilities associated with it
- System prompt contents (a critical and often underprotected asset)
- Connected tools, APIs, and external services
- Configuration details that shape the agent's behavior and boundaries
Core Challenges in AI Security Testing
Agentic AI is a dynamic target. They evolve, they connect, they accumulate context, and they operate across environments that are themselves constantly changing. The security challenge is that the system you're testing today may behave differently tomorrow, for reasons that have nothing to do with anything your team changed.
An agent connected to GPT-4 today may be running on GPT-5 tomorrow, without a single line of code being changed. The underlying model gets updated, and the agent's behavior shifts with it. Security testing that happened last quarter is no longer describing the same system. - person
This is the operating environment for anyone responsible for securing agentic AI. The challenges are structural, baked into how these systems are built and how they evolve.
The through-line across all of these is the same: agentic AI requires security testing that is continuous, behavioral, and built around intent. Testing inputs and outputs is no longer enough on its own.
Automated AI Red Teaming as a Security Discipline
Knowing the challenges is one thing. Building a security posture that actually accounts for them is another. The practices below reflect what mature, intent-aware AI security coverage looks like in an agentic context:
1: Build a complete agent inventory
You can't secure what you can't see. That means knowing every agent and AI application in your environment in detail:
- The model it runs on
- Which tools it connects to
- What its system prompt contains
- Where it sits in your broader infrastructure.
2: Treat system prompts as critical assets
System prompts define an agent's purpose, boundaries, and behavior. They're also a primary point of failure. Prompts should be versioned, reviewed, and tested with the same rigor applied to any other security-sensitive configuration.
3: Test for intent, not just output
Chatbots can generate harmful responses, which is bad enough. But agents can do worse, because attackers can shift their intent. Red teaming should probe for fragile intent: the specific conditions under which an agent can be persuaded to pursue a goal it was never meant to pursue.
4: Simulate complete workflows, not isolated inputs
Enterprise AI risk plays out across sequences of actions. Security testing should reflect that, and cover end-to-end scenarios that involve tool calls, external integrations, and multi-step decision chains.
5: Make testing continuous
Point-in-time assessments describe a system that may already be different by the time findings are acted on. Automated AI red teaming tied to the application lifecycle keeps coverage current by staying abreast of model updates, prompt changes, or new tool connections.
6: Enforce context-aware access controls
Agent permissions should reflect actual use case. An agent that needs read access to a database doesn't need write access. Alignment between what an agent can do and what it's meant to do is itself a security control.
AI Security Testing With Lasso
Lasso is built around a core premise: that securing agentic AI requires visibility into the full scope of what an agent can do. That means knowing every agent in your environment, understanding its dependencies, testing it continuously as it evolves, and enforcing policies that reflect its actual behavior.
The platform covers a number of interconnected capabilities. Discovery and inventory gives security teams a complete picture of every agent and AI application across code and cloud.
From there, automated AI red teaming probes each agent across all three testing layers:
- Single-turn static attacks drawn from a continuously updated library of adversarial payloads
- Multi-turn testing through offensive agents designed to find fragile intent through sustained conversation
- Bespoke attacks tailored to the specific tools, workflows, and access permissions of the target application

Where vulnerabilities are found, Lasso builds adaptive guardrails automatically, turning red teaming findings directly into runtime policy. That loop between testing and enforcement is what makes continuous security posture management possible for systems that don't stand still.
Security Starts With Understanding What Your Agents Are Actually Doing
Agentic AI is already making decisions that carry real business consequences. The urgent security question now is whether attackers can make them act outside their intended purpose, and whether your team would know if they were.
That requires testing that matches the way these systems actually work. It requires visibility from code to cloud, red teaming that goes beyond single-turn prompts, and the ability to harden the specific vulnerabilities that belong to your agents.
The fragile intent in any agentic system can be found. Lasso ensures that you find it first. Book a demo to learn more about how to secure agentic AI from discovery to runtime.
FAQs

.avif)
.png)
.png)
