Introducing Agent Harness Testing in Cisco AI Defense

Today, we are excited to introduce Agent Validation as a new evaluation capability in AI Defense: Explorer Edition, the free self-service version of Cisco AI Defense, that is built specifically for agentic AI systems. Agent Validation builds on the agentic security enhancements to Cisco AI Defense announced at Cisco Live, which introduced adaptive red teaming, Policy Studio guardrails, and supply chain discovery for agents. Agent Validation joins the existing suite of red teaming features, extending Explorer Edition’s coverage to the surfaces that are unique to agent harnesses: tool routes, indirect content channels, and persistent state across sessions.

Agent Validation is the first capability in what will become a broader portfolio of agent harness testing in Cisco AI Defense. We will continue expanding coverage as new agent patterns, frameworks, and attack classes emerge in the threat landscape.

Why Agents Need Their Own Red Teaming

Chat-based red teaming is essential for evaluating how a model handles adversarial prompts, jailbreaks, and multi-turn manipulation. It tests the conversational surface thoroughly, because it is how most users interact with most models. When a model is wrapped in an agent harness, the scaffolding of tools, memory, retrieval, and orchestration logic that turns a standalone model into an agent, new attack surfaces appear that a conversational evaluator was never designed to observe or exploit.

Agents read support tickets, fetch documentation, install skills, and write to files. They may call tools with arguments the user never typed or run multi-step workflows that span across multiple sessions. An attacker who understands agent harnesses may focus on plant instructions in content the agent will retrieve, shape tool arguments in ways the user never typed, or coerce the agent into modifying persistent state that survives the current session.

A conversational evaluation will not observe any of this. The chat transcript looks clean. Meanwhile, the actual exploit exists outside the chat interaction itself.

We built Agent Validation to test the surfaces that matter for agentic systems:

Tool routes: what the agent does when its own legitimate tools are invoked with malicious arguments
Indirect channels: instructions hidden in retrieved documents, tool outputs, support tickets, and other content the agent treats as data
Persistent state: modifications to policy files, workflow definitions, approval state, and installed capabilities that survive past the current session

These threats map back to the Cisco AI Security and Safety Framework taxonomy, covering attacker objectives like OB-001 Goal Hijacking, OB-007 Sabotage / Integrity Degradation, and OB-009 Supply Chain Compromise, alongside agent-specific techniques like indirect prompt injection, tool parameter abuse, and untrusted skill installation. The framework gives us a shared vocabulary for what we are testing and why it matters.

What Makes Our Approach Different

Every agent deployment has different tools, content sources, and policy artifacts; the attack surface is shaped by what’s wired into the harness itself. Agent Validation runs an autonomous attacker that performs live reconnaissance against your specific agent, builds a structured profile of the attack surface, and adapts if initial attacks were unsuccessful.

A difficult problem in agent red teaming is knowing whether an attack actually succeeded. If the agent says “I installed the skill” or “I fetched that URL,” that’s a claim, not evidence. Agent Validation solves this with a verification approach that produces independent ground truth by correlating the agent’s response with what the framework actually observed and with out-of-band telemetry the agent has no reason to treat as significant. A finding is only marked confirmed when these independent signals agree.

The Agent Validation UX is three easy steps: connect an agentic target, pick Agent Validation as the validation type, and click Run. No objective picker, budget slider, or goal text box. Figure 1 shows this in detail.

Figure 1. Starting an Agent Validation Run

Every run executes a pre-defined coverage matrix curated by Cisco’s AI Threat Intelligence & Security Research team—the same team that maintains the Cisco AI Security and Safety Framework. The objectives cover indirect prompt injection, system-prompt integrity, tool argument abuse, exfiltration, persistence and policy mutation, capability chaining, untrusted code paths, and sensitive-data solicitation.

What the Report Delivers

Figure 2. Coverage matrix and overview visible after run completion

Every Agent Validation run produces a report organized around what a security leader needs to act on:

Coverage transparency: objectives total versus objectives exercised, so customers can see honestly what was executed for any given run (Figure 2)
Findings sorted by severity: each with the originating attempt, the agent’s response, the tool calls observed, the canary signal if any, the benign-control replay result, and a remediation note (Figure 3)
Discovered, attacked, and skipped tools: what reconnaissance enumerated, what the attacker exercised, and what it skipped and why
A full evidence trail: the prompt, the response, the baseline behavior on a neutral surface, the control replay, and the generated “malicious” artifact

Figure 3. Findings overview of an Agent Validation run

Looking Ahead

As agent frameworks, tool ecosystems, and skill formats evolve, the attack surfaces will evolve with them. The threat landscape will drive what we build next: new objectives, new attacker tactics, and broader coverage as agent patterns shift in real deployments.

To see Agent Validation in action, visit Cisco AI Defense: Explorer Edition today.

Disclaimer: Agent Validation evaluation results reflect agent behavior against the described methodology at the time of testing and do not constitute an endorsement, certification, or guarantee that any agent is safe, secure, or fit for a specific use case. Customers are responsible for conducting their own assessments and for layering appropriate runtime protections on top of validation results. Cisco AI Defense: Explorer Edition is provided as-is without warranties of any kind.

Cisco Blogs

Artificial Intelligence - AI

Introducing Agent Harness Testing in Cisco AI Defense

Why Agents Need Their Own Red Teaming

What Makes Our Approach Different

What the Report Delivers

Looking Ahead

Authors

Nicholas Conley

AI Defense Researcher

AI Software & Platform

Gurpreet Kaur Khalsa

Principal Product Manager

AI Software & Platform

Leave a Comment Cancel reply