What a policy gate catches in AI-generated code, and what slips through

dev.to

I maintain an open-source GitHub Action called vorsken. It does one thing: scan the diff on a pull request with Semgrep, apply a fixed policy, and return BLOCK, FLAG, or PASS. No dashboard, no model that drifts over time. Rules at ERROR/HIGH/CRITICAL severity block the merge, WARNING/MEDIUM flag it, the rest pass. Same diff, same verdict.

The usual pitch for a tool like this is that it catches the SQL injection your AI assistant wrote. I wanted to see what it actually catches against real assistant output, so I generated 28 functions and ran them through.

The test

Seven backend tasks: a FastAPI upload endpoint, a URL-fetch helper, JWT auth, a SQL filter, an ImageMagick subprocess call, a LangChain file agent, and a LangChain RAG pipeline. I generated each one four times, with ChatGPT (GPT-5.5 Instant), Claude Code (Opus 4.8), Claude Code plus the security-guidance plugin, and Cursor (Composer 2.5). Single-shot, neutral prompt, no security hints. Then I scanned all 28 with the same ruleset.

I'm reporting which rule fired on which file, not whether some model thinks the code is safe. That part you can reproduce.

Task ChatGPT Claude Code + plugin Cursor Verdict
file upload PASS
url fetch (SSRF) ssrf ssrf ssrf FLAG / Cursor PASS
jwt auth api8 api8 BLOCK / 2 PASS
sql filter PASS
imagemagick PASS
fs agent overperm 1 BLOCK / 3 PASS
rag dangerous dangerous dangerous dangerous BLOCK

7 BLOCK, 3 FLAG, 18 PASS across 28 functions.

The basics were fine

SQL filter, ImageMagick, file upload: clean on every tool. The SQL was parameterized, the subprocess calls passed argument lists instead of shell strings, the uploads weren't doing anything reckless. If you still expect current models to spray SQL injection across a straightforward CRUD task, they don't. On conventional work they get it right.

Two of the flags are soft. The JWT api8 hits landed on a SECRET_KEY = "CHANGE_ME" placeholder, which you can read as a false positive or as a gate doing its job. The other two configs passed that task: the plugin removed the secret while generating, and Cursor read it from an environment variable. The SSRF flag I'll come back to.

The two findings worth talking about were both in framework code, and they are two different kinds of problem.

Finding 1: an agent with the run of the filesystem

The file-agent task uses LangChain's FileManagementToolkit. Pass it a root_dir and a short selected_tools list and it's pinned to one directory with the operations you chose. Leave those out and it gets the whole filesystem and every operation, delete included.

Three of the four configs scoped it. Claude Code didn't, and the gate's overpermissioned-agent-tool rule blocked it. That is one tool out of four, so it is not evidence that "agentic code is dangerous," and I won't pitch it that way. But the scoped version costs one extra argument, and the unscoped one is what you get by default. That asymmetry is the reason to gate it.

Finding 2: the dangerous flag you can't avoid

The RAG task loads a local FAISS index. All four configs wrote allow_dangerous_deserialization=True, and all four got blocked.

This is different from the agent case. The flag isn't a mistake. FAISS won't load a local index without it, and the deserialization really is unsafe, because it's pickle underneath. The gate can't tell whether that index is your own build artifact or something an attacker dropped in the directory. So it stops at the merge and forces someone to answer that question: keep it because the index is trusted, or move to a format that isn't a code-execution path. The gate doesn't make the call. It makes you make it, in the open.

Where it misses

Now the SSRF flag. Three configs used requests, and the ssrf-via-requests rule flagged them. Cursor used httpx, which that rule doesn't cover, so it passed. The Cursor code isn't safer; it sets follow_redirects=True on an unvalidated URL, the same exposure as the others. The rule just has a hole. A pass from this gate means no rule matched, which is not the same as safe.

The upload task is similar: there's no path-traversal rule yet, so that PASS is partly the gate not checking. And when SSRF does fire, it's a blunt syntactic flag rather than a precise one. These are the limits of a pure-syntax gate, and they're written down in the repo.

That's the trade a gate like this makes. It isn't clever and doesn't try to be. It runs on every PR, and it doesn't care which tool wrote the code or whether a linter was running at the time. The plugin config fixed that hardcoded secret before the gate ever saw it, which is fine, but the plugin isn't on every repo or every machine, and it leaves no record. In-session tools are the first pass. The merge gate is the part that's always there and the same for everyone.

What I'd take from it

The models are not bad at security on direct tasks. The problems showed up one layer up: in framework defaults, and in a trust decision the code can't make on its own. Those are the things worth blocking deterministically at the merge, whoever or whatever wrote the diff.

vorsken is MIT and the rules are in the repo, so you can run the same scan on your own output.

vorsken on GitHub

Source: dev.to

arrow_back Back to News