If your platform has users who might be minors, you're probably relying on a word list somewhere. A set of flagged terms. Maybe an automated content filter that scans messages for known bad phrases before delivery.
It doesn't work. Not because it's badly implemented — but because the approach is fundamentally mismatched to the problem.
This post explains why, and describes what a behavioral detection approach looks like in practice.
The problem with keyword filters
Keyword-based detection has one fatal assumption: that harmful intent is expressed in the words themselves.
Online grooming doesn't work that way. Grooming is a process — typically lasting weeks or months — that begins with the establishment of trust. The early conversations between a predator and a potential victim often look completely benign. Generic questions. Compliments. Expressions of understanding. The escalation is gradual, calibrated, and specifically designed to avoid triggering detection.
Predators have adapted to platform safety tools over years. They know which words are flagged. They use alternative spellings, coded language, platform-specific slang, and — increasingly — AI-generated text designed to produce exactly the right pattern of not-quite-suspicious content.
By the time a keyword filter triggers, in a grooming situation, significant harm has typically already occurred. The relationship has been built. The trust has been established. The filter caught the symptom, not the pattern.
What behavioral detection watches instead
A behavioral approach shifts the question from "what did this message say?" to "how has this interaction evolved over time?"
The signals that actually matter in grooming detection are not contained in individual messages. They're in the trajectory of a relationship:
Escalation velocity. How quickly is a relationship moving from surface-level to personal? Healthy relationships between strangers on platforms tend to develop gradually. Grooming relationships often escalate unusually fast or in unusual directions — from general interest in a game to personal disclosure requests in a compressed timeframe.
Contact pattern shifts. How is the frequency and timing of contact changing? A pattern where an adult gradually shifts contact toward unusual hours, toward private channels, or toward increasingly exclusive one-on-one communication is a meaningful signal — one that no individual message would reveal.
Relationship network analysis. Who is talking to whom? Does one adult account have an unusual pattern of initiating contact with multiple minor-age accounts? Are multiple adult accounts approaching the same minor? Coordinated targeting looks very different in a relationship graph than in any individual message.
Linguistic style shift. How does the vocabulary and conversational register of a conversation change over sessions? Conversations that shift from platform-appropriate language to increasingly personal, boundary-testing, or manipulative patterns across multiple sessions are statistically distinct from normal conversations that just happen to include edge-case vocabulary.
None of these signals trigger on individual messages. All of them require watching behavior over time — which is exactly what keyword filters don't do.
What this looks like in a real system
We built SENTINEL to implement exactly this approach: a behavioral intelligence platform that watches interaction patterns over time and produces an explainable risk score for each user.
The four signal layers SENTINEL tracks:
Linguistic signals: how conversation style evolves across sessions. Not the presence of specific words, but statistical patterns in vocabulary, register, and content trajectory.
Graph signals: relationship structure. Who initiates contact with whom. Multi-account coordination. Network-level targeting patterns.
Temporal signals: escalation dynamics over time. Contact frequency, session bridging, pattern shifts over days and weeks.
Fairness signals: before any detection model deploys, a demographic parity audit runs. If the model disproportionately flags one demographic group over another — for any reason — it cannot ship. This is enforced architecturally, not as an optional check.
The output for each user is a risk score (0-100), a tier label (trusted / watch / restrict / critical), and — critically — a plain-language explanation of exactly which behavioral signals drove the score. A moderator doesn't see a black box number. They see: "this account has escalated contact frequency by 340% over 14 days, has shifted from group conversations to exclusively private channels with one minor-identified account, and has used three statistically anomalous vocabulary shifts characteristic of trust-building language."
That explanation has two purposes. First, it makes moderation decisions defensible — legally, in internal audits, and to regulators. Second, it dramatically reduces moderator burnout, which is one of the most serious operational problems in trust and safety at scale.
The regulatory dimension
If you're building a platform that operates in the UK or EU and has any minor users, this is no longer an optional consideration.
The UK Online Safety Act requires platforms to conduct risk assessments, demonstrate active harm mitigation, and maintain audit trails. Ofcom's categorisation register and additional duties consultation are expected in July 2026 — and platforms that can't demonstrate proactive child safety measures will be non-compliant.
The EU Digital Services Act requires large platforms to demonstrate proactive child safety measures or face significant fines. The enforcement machinery is live.
COPPA in the US requires specific data handling, retention, and parental consent infrastructure for platforms with users under 13.
SENTINEL was designed from the start to satisfy these requirements architecturally — not as features added later. Tamper-evident 7-year audit logs, COPPA data retention, GDPR erasure request handling, and jurisdiction-aware data policies are infrastructure-level components, not configuration options.
What the integration actually looks like
SENTINEL is built as 13 independent microservices, but you don't have to deploy all of them at once. The integration path is designed to be incremental:
- Send message events to the REST API (or use the Python or Node.js SDK)
- Receive risk scores and explanations via webhook callback
- Route flagged users to human review or automated action based on your threshold configuration
Under an hour to first integration with the SDK. Docker Compose for local setup. The full infrastructure (PostgreSQL, Redis, vector database) is included.
Licensing
SENTINEL is free for any platform under $100k annual revenue and all non-commercial or research use. Commercial license for larger platforms. The license automatically converts to Apache 2.0 in 2046 — a commitment that the tool stays in the open-source ecosystem long-term.
GitHub: https://github.com/sentinel-safety/SENTINEL
We're v1, released today (April 23, 2026). Being transparent about that — no large community of production deployments yet. The technology is solid; the community is at the start. If you're working on a platform where this is relevant, we'd like to hear from you: sentinel.childsafety@gmail.com.
The bottom line
Keyword filters are not a child safety strategy. They are easy to evade, late to trigger, and give platforms false confidence that the problem is handled.
Behavioral detection — watching how conversations evolve, how relationships form, how escalation develops over time — is what proactive child safety actually looks like. It's what the regulatory frameworks are beginning to require. And it's what SENTINEL is built to do.
If you're building a platform where minors might be present, the tools exist now to do this properly. They're open source. They're free for most of you. There's no excuse left for relying on a word list.
SENTINEL is an open-source behavioral intelligence platform for child safety on digital platforms. GitHub: https://github.com/sentinel-safety/SENTINEL. Contact: sentinel.childsafety@gmail.com