I build a security platform. Last night I stopped adding features and did something less fun and more honest: I sat down to make every capability prove it actually works — end to end, with real data, demanding a real pass or fail.
"It ran" is not a pass. A page that renders is not a feature. A green checkmark is a claim, not evidence. So I went capability by capability and tried to break each one.
I found four real bugs and one of them was a gut-punch: a whole detection engine that was wired into the UI, unit-tested, and never actually ran in production.
Here's how the night went.
The rule: drive it, don't admire it
My method was boring on purpose. For each capability:
Feed it real input through the real entry point (CLI or API), not a test fixture.
Check the data actually landed (query the DB, don't trust the success message).
Feed it a malicious input and a benign input — it has to fire on one and stay quiet on the other.
The detection engine passed cleanly. I threw a PsExec process event at it and it lit up:
$ zds-core detection eval --event '{"event_type":"process_create","process_name":"psexec.exe"}'
1 alert(s):
[high] PsExec Execution — (matched: map[process_name:psexec.exe])
A wevtutil cl Security event tripped a critical "Log Clearing" rule. A plain notepad.exe matched nothing. Good — it detects, and it doesn't cry wolf.
(Small UX papercut I fixed while I was there: if you forgot the event_type field, the engine silently matched nothing and printed "no rules matched" — which reads exactly like "you're safe." Now it warns you that the event can't match any rule. Silence that looks like safety is the most dangerous output a security tool can produce.)
The one that hurt: ITDR
Identity Threat Detection and Response. The engine has detectors for impossible travel, credential spraying, brute force, privilege escalation. All unit-tested. All green.
I ran the real flow: POST two login events for one user — New York, then London thirty minutes later. That's ~5,500 km in half an hour. Textbook impossible travel. Then I asked for the alerts.
Nothing. Zero alerts.
The events were in the database. The detector logic was correct in isolation. But the alerts endpoint was empty. So I read the ingest handler:
func (s Server) handleRecordITDREvent(w http.ResponseWriter, r http.Request) {
var evt itdr.IdentityEvent
json.NewDecoder(r.Body).Decode(&evt)
id, err := s.db.InsertIdentityEvent(&evt) // store it...
// ...and return. That's it.
writeJSON(w, http.StatusCreated, evt)
}
It stored the event and returned. It never called the detector. Every identity event in the system was being filed and forgotten. The engine that was the entire point of the feature had no caller. In production the alerts list would have been empty forever, and it would have looked like good news.
The fix was small once I saw it — hold a persistent detector on the server and run every event through it:
id, err := s.db.InsertIdentityEvent(&evt)
// ...
if a := s.itdrDetector.RecordEvent(&evt); a != nil {
s.db.InsertIdentityAlert(a)
alert = a
}
Re-ran the exact same two logins:
ALERT: impossible_travel high "impossible travel detected: 5570 km in 30m0s"
risk: critical (80)
That's the moment the night paid for itself.
The other three
CSAF export (vulnerability advisory) returned zero vulnerabilities even when findings carried CVEs. It built the advisory only from the vulnerabilities table, but a CVE imported from a scanner lands on the finding first. I ran a real Nuclei scan, it found a Log4Shell template match (CVE-2021-44228), imported clean — and the export dropped it. Fix: synthesize the advisory entry from the finding when the vuln table hasn't caught up. Now the CVE shows up where it belongs.
GRC compliance reported 0% against every framework. The "supported frameworks" list and the controls being scored were two different control-ID namespaces that never intersected. The command was structurally incapable of returning anything but zero. Pointed it at the frameworks the controls are actually tagged with — now it scores.
Workflow executions stored started_at in UTC and completed_at in local time. Same row, off by my timezone offset, quietly corrupting every duration. One column was using the SQLite CURRENT_TIMESTAMP default while the other came from the app.
What actually held up
Plenty did, and I made it prove it: incident lifecycle and timelines, the workflow engine, Diamond Model events, threat-hunting hypotheses, forensic hashing + YARA validation, case management with a real state machine, UEBA (I trained a baseline on nine normal logins and it correctly flagged a 3am login from Russia pulling 900 events as unusual hour + location + volume), and the Active Directory scanner against a real Windows Server 2022 domain controller — real findings, real MITRE mappings, no invented kerberoasting that wasn't there.
The lesson I keep relearning
The bugs that scare me aren't the ones that crash. They're the ones that return success while doing nothing. A detector with no caller. An advisory that silently drops the one CVE that matters. A compliance score that's structurally pinned to zero. Each one renders fine. Each one would demo fine. Each one is a lie your own dashboard tells you.
If you ship anything people rely on, spend a night being your own adversary. Don't ask "does it run." Ask "show me." Feed it the bad input. Query the table. Make the green checkmark earn it.
The detector runs now.