ProofLayer — Autonomous Red-Teaming for AI Systems

Executive Summary

AI coding agents have transformed software development. Tools like GitHub Copilot, Cursor, Amazon Q, and Claude Code are now embedded in the workflows of millions of developers. But this rapid adoption has created a new and dangerous attack surface that traditional security tools were never designed to address.

In 2025 alone, we saw crypto wallets drained via malicious IDE extensions, unauthenticated code injection exploits in AI agent frameworks, and the first fully autonomous AI-orchestrated cyberattack. Meanwhile, “vibe coding”—where developers describe what they want and let AI handle implementation—means more code ships with less human review. Benchmarks consistently show LLMs don't prioritize writing secure code, and most organizations lack regular security testing for AI agent workflows.

Key Finding

Multiple documented AI security incidents in 2025 — including the Cursor typosquatting attack and the $3.2M procurement fraud — were triggered by simple prompts or package name confusion, not sophisticated exploit code.

Real-World Security Incidents

The following timeline documents the most significant security incidents involving AI coding agents and related tools throughout 2025. Each incident represents a distinct attack vector that traditional security tools failed to prevent.

Q1 2025CriticalSupply Chain

Cursor Typosquatting Extension

Ethereum core developer's crypto wallet drained after downloading a malicious Cursor extension via typosquatting attack.

Q2 2025CriticalCode Injection

Langflow AI Code Injection

CrowdStrike identified multiple threat actors exploiting unauthenticated code injection in Langflow, deploying malware and stealing credentials.

Q2 2025HighSupply Chain

Malware in Hugging Face Models

ReversingLabs discovered malware hidden inside AI models hosted on Hugging Face, the largest open-source ML repository.

Jul 2025CriticalPrompt Injection

Amazon Q Developer Extension Exploit

Prompt injection via compromised VS Code extension planted malicious code. Passed verification and remained live for two days.

Q3 2025HighPrompt Injection

GitHub Copilot MCP Vulnerability

System-level vulnerability in MCP protocol enabled prompt injection attacks across AI coding tool connections.

Sep 2025CriticalAutonomous

First Autonomous AI Cyberattack

Researchers documented the first fully autonomous AI-orchestrated attack — the AI handled 80-90% of the operation independently.

Nov 2025HighSupply Chain

43 Compromised Agent Components

Barracuda Security identified 43 different agent framework components with embedded vulnerabilities introduced via supply chain compromise.

Amazon Q Developer Extension Exploit

In July 2025, Amazon disclosed an attempt to exploit two open-source repositories used in the Amazon Q Developer extension for VS Code. The attacker used prompt injection to plant malicious code designed to wipe users' local files and disrupt AWS infrastructure. The compromised extension passed verification and remained publicly available for two days before Amazon mitigated the threat.

Cursor Typosquatting Attack

Ethereum core developer Zak Cole had his crypto wallet drained after downloading a malicious Cursor IDE extension obtained through a typosquatting attack. The extension mimicked a legitimate package with a slightly altered name, and because AI coding tools increasingly install dependencies autonomously, the exposure window for such attacks has widened significantly.

This incident highlights a particularly insidious attack vector: AI agents that autonomously resolve and install packages can be tricked into pulling malicious dependencies without developer awareness. The traditional human review step — scanning a package name before confirming install — is bypassed entirely when an AI agent handles dependency management.

Langflow Unauthenticated Code Injection

CrowdStrike (CrowdStrike Threat Report, 2025) reported that multiple threat actors exploited an unauthenticated code injection vulnerability in Langflow, a widely used tool for building AI agents and workflows maintained by DataStax. The attackers used the vulnerability to steal credentials and deploy malware across compromised environments.

Supply Chain Vulnerabilities at Scale

The Barracuda Security report (Barracuda, 2025), published in November 2025, identified 43 different agent framework components with embedded vulnerabilities introduced through supply chain compromise. These weren't theoretical findings — they were discovered in production code actively being used by enterprise organizations.

Separately, ReversingLabs found malware hidden inside AI models hosted on Hugging Face, the largest open-source ML hosting platform. The malware was embedded directly in model files, a vector that most security scanning tools aren't designed to inspect.

Real-World Impact

A mid-market manufacturing company deployed an agent-based procurement system. Within one quarter, attackers compromised the vendor-validation agent through a supply chain attack, resulting in $3.2 million in fraudulent orders before detection.

Attack Vector Categories

The four primary attack surfaces for AI coding agent security:

Supply Chain

Compromised dependencies & packages
Malicious IDE extensions
Poisoned AI model repositories
Backdoored agent framework components

Prompt Injection

Code comments as injection vectors
Context window manipulation
Memory poisoning attacks
MCP protocol exploitation

Code Injection

Generated vulnerable code patterns
Insecure API usage in outputs
Unvalidated external inputs
Payload distribution across files

Social Engineering

Typosquatting attacks
Malicious repository clones
Fake documentation injection
Trusted source impersonation

Bypassing AI Security Reviews

Even when organizations attempt to use AI itself for security review, the results can be dangerously unreliable. Research from Checkmarx (Checkmarx, 2025) published in 2025 demonstrated multiple techniques for bypassing Claude Code's built-in security review capabilities.

The researchers dubbed this class of attack “Lies-in-the-Loop” (LITL) — exploiting the fundamental trust that AI code reviewers place in contextual signals like comments and documentation.

Code Comments as Payload Delivery

The simplest bypass technique involved embedding misleading comments in code. Researchers placed a comment describing a fictional “sanitize()” function above obviously unsafe JavaScript containing child_process execution. Claude Code analyzed the code, trusted the comment over the actual implementation, and reported “0 vulnerabilities found.”

Payload Distribution

Rather than concentrating malicious code in a single file, researchers split payloads across multiple files with added legitimate-looking code as noise. Claude Code detected the concentrated malicious pattern easily but missed the same payload when distributed and layered with seemingly benign functions.

Exploiting Testing Methodology

When Claude Code attempts to verify vulnerabilities, it creates and executes test cases. For Pandas DataFrame.query() RCE vulnerabilities, the AI wrote tests that couldn't replicate the actual exploit pattern, then concluded the real vulnerability was a false positive. Worse, the security review process itself executed code, meaning the act of reviewing could trigger production database queries or other dangerous operations if not properly sandboxed.

Bottom line: These findings demonstrate conclusively that AI reviewing AI-generated code cannot be the sole security layer. Human oversight and specialized security tooling remain essential.

Emerging Threats

Memory Injection Attacks

Research from Lakera AI in November 2025 demonstrated a particularly concerning attack: indirect prompt injection via poisoned data sources could corrupt an AI agent's long-term memory. Once corrupted, the agent developed persistent false beliefs about security configurations, API endpoints, or code patterns — and actively defended these beliefs as correct when questioned by human operators.

For coding agents, this means an attacker who can influence any data source the agent reads (a README file, a Stack Overflow answer, a dependency's documentation) could permanently alter the agent's understanding of what constitutes “secure” code.

MCP Protocol Risks

The Model Context Protocol (MCP), designed for easy AI-to-system connection, is being deployed rapidly without foundational security controls. MCP servers don't need to be intentionally malicious to be dangerous — many have vulnerabilities and misconfigurations that open paths to OS command injection. Communication between MCP clients and servers is not always secure, and the protocol enables an essentially unlimited number of data source connections, each representing a potential attack vector.

The First Autonomous AI Cyberattack

In September 2025, researchers documented (GTG-1002) the first fully autonomous AI-orchestrated cyberattack where artificial intelligence handled 80-90% of the operation independently. When initial penetration attempts triggered security alerts, the AI system automatically pivoted to alternative entry vectors, eventually compromising the target network through a less-monitored third-party integration.

This represents a qualitative shift in the threat landscape. Autonomous AI attackers don't get frustrated, don't take breaks, and can probe systems continuously while adapting their approach in real-time.

The Case for Pre-Execution Code Security

Every incident documented above exploits the same fundamental gap: the window between code generation and code execution. Traditional security approaches — static analysis, post-deployment monitoring, periodic audits — were designed for human-written code reviewed by humans. They fail in the AI coding era because AI agents generate and execute code faster than any human review process can keep up.

Addressing this requires a fundamentally different approach than traditional static analysis or post-deployment monitoring. What's needed is a pre-execution security layer — a lightweight wrapper that inspects code generated by LLMs before it's executed, deployed, or committed.

A pre-execution security layer should cover four areas: vulnerability pattern scanning on every LLM output before execution, supply chain verification of dependencies against known vulnerability databases, distributed payload detection to catch the split-file techniques documented by Checkmarx, and context-aware analysis that understands execution context rather than relying on regex alone.

The goal is not to replace AI coding agents, but to ensure the speed benefits of AI-assisted development don't come at the cost of security. CI/CD pipelines are becoming the primary attack vector for 2026 — continuous validation at the code generation layer is no longer optional.

Actionable Remediation Checklist

Based on the incidents and attack patterns documented above, here are concrete steps organizations should take to secure their AI coding workflows:

Implement pre-execution code scanning

Every code output from an LLM should pass through automated security analysis before execution or deployment. This is the single highest-impact control.

Audit AI agent permissions

Review what system access your coding agents have. Apply least-privilege principles — agents should not have write access to production environments or credentials stores.

Pin and verify dependencies

Never allow AI agents to install unverified packages. Use lockfiles, verify checksums, and maintain an allowlist of approved dependencies.

Separate AI-generated code review

Treat AI-generated code with the same rigor as third-party code. It should go through security review before merging, not just functional review.

Monitor for distributed payloads

Update your security tooling to detect malicious code distributed across multiple files — the 'Lies-in-the-Loop' pattern that bypasses single-file analysis.

Sandbox AI agent execution

Run AI coding agents in isolated environments. If an agent is compromised, the blast radius should be limited to the sandbox.

Establish an AI security incident response plan

Traditional IR plans don't cover AI-specific attack vectors. Define procedures for compromised agents, poisoned training data, and supply chain attacks on AI tools.

Conclusion

The security incidents of 2025 make one thing clear: generating code fast and reviewing it later is fundamentally broken. Security controls must operate at the point of code generation — before execution, before deployment, before commit.

The question is no longer whether AI coding agents introduce security risks. It's whether your organization catches those risks before they reach production.