Windows watchdog that silently spawned 11 duplicate processes — and the one-line fix

I came back to my desk and counted 11 cmd windows instead of the 6 I expected. All running Python, all for the same project, all spawned by my own watchdog.

This post is the diagnostic trail and the fix, in case anyone else is using tasklist /FI "WINDOWTITLE" as a process-liveness check on Windows.

The setup

I run ~6 Python daemons on a Windows 11 box. Each is launched via:

start "Agent" cmd /c "cd /d C:\path && python -m agent.main"

Task Scheduler fires watchdog.bat every 5 minutes. The watchdog is supposed to:

Check if each daemon is alive
If not, relaunch it

For daemons with HTTP ports, liveness is easy: try a TCP connect. For portless daemons (my Telegram agent), I was using a title filter:

tasklist /FI "WINDOWTITLE eq Agent" 2>nul | find /I "cmd.exe" >nul
if errorlevel 1 (
    start "Agent" cmd /c "python -m agent.main"
)

The symptom

11 cmd windows. Task Manager showed 11 python.exe processes, all with the same command line. Memory creeping. No visible errors.

The diagnostic

I ran the title-filter check manually from an admin cmd:

C:\> tasklist /FI "WINDOWTITLE eq Agent"
INFO: No tasks are running which match the specified criteria.

The windows were right there. Visible. Titled 'Agent'. And tasklist couldn't see them.

Why WINDOWTITLE is broken for this use case

Four reasons, in rough order of frequency:

1. Title mutation. start "Title" sets the initial console title. Any child process that writes to the title (via the title command, or SetConsoleTitle, or Python logging that touches it) breaks the filter.

2. Minimized / background consoles. I couldn't reproduce this reliably but anecdotally, tasklist's title filter misses minimized consoles sometimes.

3. Session isolation. Scheduled tasks run in session 0. Your interactive consoles live in session 1. Cross-session title queries don't always work.

4. Window vs. process. start "Title" cmd /c "..." creates a cmd.exe window. The Python process is a child of that cmd. The title you set attaches to the cmd window, not the Python process — and tasklist's WINDOWTITLE filter matches against the process that owns the console window, which may be cmd, not python.

Any one of these is enough. Combined, they make the filter effectively a random boolean.

The fix

Match by command line, not window title. The command line is the authoritative identity of the process:

wmic process where "commandline like '%%agent.main%%'" get processid 2>nul | findstr /r "[0-9]" >nul
if errorlevel 1 (
    echo [%date% %time%] Agent DOWN -- restarting >> watchdog.log
    start "Agent" cmd /c "cd /d C:\path && python -m agent.main"
)

The commandline like '%%agent.main%%' pattern matches any process whose full command line contains the substring agent.main. That's my entry point (python -m agent.main), so it's a stable signal regardless of what title the window currently shows.

PowerShell alternative (wmic is deprecated on newer Windows)

$running=Get-CimInstanceWin32_Process|Where-Object{$_.CommandLine-match'agent\.main'}if(-not$running){Start-Processcmd-ArgumentList'/c','cd /d C:\path && python -m agent.main'-WindowStyleNormal}

Works the same way. CommandLine is a property on Win32_Process that contains the full invocation string.

Verification

After the fix, I ran watchdog.bat three times back-to-back. Process count stayed at 1. Before the fix, it would have been 3.

before fix: 11 agent processes after 24h
after fix:  1  agent process after 24h

Takeaways

If you're doing process-liveness checks on Windows:

If the process has a port, use the port. Test-NetConnection or a TCP socket connect. It's the most reliable signal and it's fast.
If the process is portless, match by command line, not window title. wmic or Get-CimInstance are your friends.
Never trust tasklist /FI "WINDOWTITLE" for critical paths. It's fine for interactive debugging, not for automation.

Also worth mentioning: my watchdog's failure mode was silent. No error, no crash — it was errorlevel 1 on a false negative and cheerfully spawned another daemon. If you're writing watchdog scripts, add a duplicate-guard: before spawning, check if N > 1 copies are running and bail with a loud warning.

Something like:

for /f %%c in ('wmic process where "commandline like '%%agent.main%%'" get processid ^| find /c /v ""') do set count=%%c
if %count% GEQ 4 (
    echo [WARN] %count% agent processes detected, skipping spawn >> watchdog.log
    goto :eof
)

That won't fix the root cause but it'll stop the bleeding while you diagnose.