How I built an intent drift detector for LLM agents

python dev.to

The Problem

AI agents fail silently.

You give an agent a clear instruction:
"Refund user 123, $50 within 7 days"

The agent returns:
"User refunded $500 immediately"

No error. No warning. Just wrong output.

This is semantic drift — when LLM output
diverges from original intent.

What I Built

SIP (State Integrity Protocol) is a lightweight
Python SDK that detects and flags drift in
LLM outputs before they cause damage.

How It Works

from sip.middleware import SIPMiddlewarePipeline

pipeline = SIPMiddlewarePipeline()
pipeline.anchor("Refund user 123 $50")

result = pipeline.run(
    output="Refund user 123 $500"
)

print(result.status)  # repair_required
Enter fullscreen mode Exit fullscreen mode

Three checks run automatically:

  1. Semantic drift (TF-IDF + cosine similarity)
  2. Intent alignment (sentence-transformers)
  3. Numeric drift ($50 vs $500 caught)

Real Test Results

Test Status
Exact match accepted
Same meaning different words accepted
Wrong output repair_required
Numbers changed repair_required
Injection attempt repair_required

Install

pip install state-integrity-protocol
Enter fullscreen mode Exit fullscreen mode

GitHub

github.com/sijan324/state-integrity-protocol

Looking for feedback from anyone building
LLM pipelines or AI agents.

What drift problems have you seen in production?

Source: dev.to

arrow_back Back to Tutorials