QIS Protocol vs Personal Health Train: Two Architectures for Distributed Health Intelligence

python dev.to

An OHDSI node in Rotterdam holds oncology outcome data for 4,200 patients enrolled across three cancer registries. A node in Toronto holds a different 3,800. Neither institution can share patient records. Neither wants to. The question is not whether to share records — it is how intelligence about what works flows between them.

This question has two dominant architectural answers right now. The Personal Health Train (PHT) moves containerized algorithms to the data: a "train" travels to each "station," executes locally, and returns aggregated results. Federated learning aggregates gradient updates from local models without centralizing training data. Both are principled privacy-preserving approaches. Both are in active deployment.

Neither achieves real-time cross-institutional synthesis at the scale the OHDSI community increasingly needs — particularly for rare disease outcomes where individual sites may observe a single qualifying patient per year, and where the value of a treatment delta compounds every time a similar node updates.

Christopher Thomas Trevethan discovered an alternative architecture on June 16, 2025, now called the Quadratic Intelligence Swarm (QIS) protocol, with 39 provisional patents filed. This article examines the PHT architecture honestly — its strengths, its structural walls — and then explains what QIS does differently and why the two approaches are complementary rather than competing.


Section 1: What the Personal Health Train Does (and Why It Matters)

The Personal Health Train is not a metaphor. It is a production infrastructure developed by the Dutch health data community — GO FAIR, the Dutch Techcentre for Life Sciences (DTL), and LUMC — and it is running today across European hospital networks.

The core concept is elegant: instead of asking hospitals to send data to a central repository, you send the analysis to the data. A "train" is a containerized analytical workload — typically a Docker image — that carries the algorithm, the execution environment, and provenance metadata. It arrives at a "station" (a hospital, registry, or cohort database), executes against the local dataset inside a controlled environment, and returns only aggregated results. Patient records never leave the institution.

The foundational paper is Beyan et al. (2020), "Distributed Analytics on Sensitive Medical Data: The Personal Health Train," npj Digital Medicine 3(76). That paper articulates the FAIR principles underpinning PHT design: data should be Findable, Accessible, Interoperable, and Reusable — but findable and accessible to algorithms traveling to the data, not to researchers pulling data out.

PHT strengths are real:

  • FAIR principles by design. Provenance is tracked at every station. The lineage of an analytical result is auditable end-to-end.
  • HL7 FHIR compatibility. PHT stations increasingly expose FHIR endpoints, making integration with hospital EHR systems tractable.
  • Active EU deployment. The Health-RI infrastructure in the Netherlands, the EUCAN-Connect project, and several national cancer registry networks are running PHT in production. This is not a research prototype.
  • Regulatory alignment. The train/station model maps cleanly onto GDPR data processing agreements. The data controller (the hospital) retains control because the data never moves.

For the OHDSI Rotterdam audience specifically: PHT and OMOP are increasingly paired. A station can expose an OMOP CDM endpoint, and a train can execute an ATLAS-generated analysis package against it. This is the stack the Dutch health informatics community has invested in substantially.

PHT is a serious system. What follows is not a criticism of its design; it is an examination of where its architecture creates structural limits.


Section 2: Where PHT Hits Structural Walls

Wall 1 — Algorithm Payload vs Outcome Payload

A PHT train is a containerized workload. Docker images for analytical jobs typically range from 200MB to 2GB, depending on the language runtime, statistical libraries, and model weights included. Each "train journey" involves uploading the container to a station, allocating compute resources to execute it, and retrieving results.

A QIS outcome packet is approximately 512 bytes.

Both protect patient privacy. The difference is what travels: PHT moves the algorithm; QIS moves a distilled outcome. The bandwidth and execution environment requirements differ by roughly six orders of magnitude. For institutions with constrained infrastructure — a regional oncology registry in a lower-resource setting, a rare disease center with two dedicated research staff — the PHT participation floor is non-trivial. The QIS participation floor is: emit a JSON packet to an address.

Wall 2 — Batch Execution vs Continuous Synthesis

PHT trains run on-demand. A researcher formulates a question, packages a train, schedules station visits, waits for the journey to complete, and retrieves results. The research loop is measured in hours to days. Results are available when the journey completes.

QIS routes outcome packets continuously as observations occur. There is no train to schedule, no station to prepare, no journey to complete. When a clinician at the Rotterdam node records a treatment outcome, that observation — distilled to a 512-byte packet — routes to semantically similar nodes immediately. The synthesis is not periodic; it is continuous.

Wall 3 — N=1 Rare Disease Exclusion

PHT stations must satisfy minimum cell count thresholds before returning results. k-anonymity thresholds of ≥5 or ≥10 observations per reported cell are standard. This is the same structural problem that plagues federated learning for rare disease: a station with insufficient observations cannot return a statistically safe output. The station is effectively excluded from contributing.

QIS outcome packets represent a single validated delta — a treatment adjustment and its observed outcome — not an aggregate. A center with one qualifying patient per year can emit outcome packets. The architecture does not require a minimum cell count because no aggregation occurs at the station. The distillation happens at the individual observation level, and the packet carries the fingerprint of the observation context, not a summary statistic that could re-identify through small denominators.

This is not a minor improvement for rare disease research. For conditions with global incidence below 1 in 100,000, the majority of participating centers may have single-digit annual case counts. PHT and FL effectively exclude them. QIS includes them by design.

Wall 4 — Container Runtime Dependency

Participating in PHT requires maintaining a container execution environment (Docker or Singularity), a train station API, compatible data format exposure (FHIR), and IT staff capable of managing all of the above. For large academic medical centers, this is achievable. For a regional hospital in a country with less developed research infrastructure, the barrier is high.

QIS requires: emit a 512-byte JSON packet to an address. Any system capable of making an HTTP POST, publishing to a message queue, writing to a distributed hash table, or appending to a shared database can participate. The participation floor is dramatically lower, and the transport mechanism is not prescribed — QIS is protocol-agnostic at the routing layer.

Wall 5 — No Quadratic Synthesis

PHT collects results from N stations in sequence or in parallel. The researcher receives N data points — one analytical result per station — and synthesizes them externally. The synthesis is linear: more stations means more results, but the intelligence does not compound between stations automatically.

QIS creates N(N-1)/2 synthesis paths. Every node's outcome packets are continuously available to every other semantically similar node. When the Rotterdam node emits a packet about a checkpoint inhibitor response in a KRAS-mutant lung adenocarcinoma patient, that packet is immediately queryable by Toronto, by Seoul, by any node whose fingerprint places it in the same semantic neighborhood. The compounding is not managed by a researcher; it is structural. It emerges from the architecture.

PHT is linear result collection. QIS is quadratic intelligence compounding.

Wall 6 — Cold Start for New Stations

A hospital joining the PHT network for the first time starts with zero accumulated intelligence from prior train journeys. Trains have not visited it before; its data has not contributed to any prior results. It takes time to accumulate participation history and to receive trains from researchers who trust its station setup.

QIS outcome packets accumulate at semantic addresses. A new node querying its semantic address for the first time pulls back every outcome packet that prior similar nodes have emitted — immediately. The cold start problem inverts: the network's prior intelligence is available at first query. A center opening a new rare disease clinic on day one can query accumulated outcomes from every similar center that has ever participated.


Section 3: What QIS Does Instead

The breakthrough that Christopher Thomas Trevethan discovered is not any single component. It is the complete loop — the full architecture working together:

Raw signal → Local processing → 512-byte outcome packet → Semantic fingerprint → Routing to deterministic address → Delivery to similar nodes → Local synthesis → New packets

Every step matters. Removing any one step collapses the compounding. This is why the architecture is the discovery, not the hash function, not the routing protocol, not the packet format in isolation.

The routing mechanism is deliberately protocol-agnostic. DHT-based routing — as used in distributed systems like Kademlia — is one strong option and achieves O(log N) lookup in the worst case. But many transport mechanisms achieve O(1): a well-structured relational database with concept ID indexing, a pub/sub system with topic-based routing, a content-addressable store. What matters is that a problem fingerprint maps to an address and retrieves similar prior outcome packets. The specific routing infrastructure does not change the quadratic scaling property; that property emerges from the N(N-1)/2 synthesis paths created by the architecture, regardless of how packets are physically routed.

For the OHDSI community specifically: OMOP concept IDs are a natural semantic fingerprinting vocabulary. A QIS outcome packet for an oncology observation can be fingerprinted using the OMOP concept IDs for the condition, the drug, the measurement, and the observed outcome. The OHDSI network already has a standardized vocabulary that maps directly onto QIS semantic addresses. No new ontology is required. The infrastructure to run QIS outcome routing on top of OMOP is already mostly built — it just has not been assembled this way before.

Critically, QIS works with existing PHT infrastructure. A PHT station can emit QIS outcome packets as a side effect of any train execution. After a train returns its aggregated results to the researcher, the same station can emit a QIS packet encoding the outcome delta observed during that execution. PHT handles the rigorous provenance-tracked analysis. QIS adds the continuous cross-institutional synthesis layer that accumulates between train journeys.


Section 4: Code Contrast — PHT Train Payload vs QIS Outcome Packet

import json

# Both protect patient privacy. PHT moves algorithms. QIS moves distilled outcomes.

# ── PHT Train Job Request (simplified) ──────────────────────────────────────────
# A researcher submits this to initiate a train journey across OHDSI stations.
# The actual train is a Docker image (200MB–2GB); this JSON is the dispatch payload.

pht_train_request = {
    "train_id": "oncology-kras-checkpoint-v3",
    "algorithm": {
        "container_image": "registry.example.org/trains/kras-checkpoint-analysis:v3.1.2",
        "image_digest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        "image_size_bytes": 847_000_000,        # 847 MB Docker image
        "runtime": "docker",
        "cpu_request": "2.0",
        "memory_request_gb": 8
    },
    "dataset_request": {
        "fhir_version": "R4",
        "omop_cdm_version": "5.4",
        "cohort_definition_id": "atlas-cohort-88421",
        "concept_ids": [4115276, 40213160, 1110942],   # KRAS mutation, NSCLC, pembrolizumab
        "min_cell_count": 5,                            # k-anonymity threshold
        "date_range": {"start": "2020-01-01", "end": "2025-12-31"}
    },
    "execution_environment": {
        "station_api_version": "2.1",
        "requires_gpu": False,
        "timeout_seconds": 3600,
        "result_schema": "omop-cohort-summary-v2"
    },
    "provenance": {
        "researcher_id": "orcid:0000-0002-1825-0097",
        "approved_by_dac": True,
        "data_use_agreement": "DUA-2025-NL-0041"
    }
}

# ── QIS Outcome Packet (same oncology observation) ───────────────────────────────
# Emitted by the station after observing a single validated treatment outcome.
# No patient record. No model. No container. A distilled delta.

qis_outcome_packet = {
    "protocol": "QIS/1.0",
    "packet_id": "7f3a9c2b-e841-4d0f-b6a1-2f8d3e7c9104",
    "timestamp_utc": "2025-11-14T09:22:31Z",
    "semantic_address": "omop:4115276:1110942:40213160:response:partial",
    "outcome_delta": {
        "omop_condition": 4115276,       # KRAS G12C mutation, NSCLC
        "omop_drug": 1110942,            # pembrolizumab
        "omop_measurement": 40213160,    # RECIST partial response
        "observed_direction": "+1",      # positive outcome
        "confidence": 0.81
    },
    "node_fingerprint": "sha256:a3f1c8",  # anonymized station identifier
    "routing_hop_count": 3,
    "ttl_seconds": 86400
}

# ── Payload size comparison ───────────────────────────────────────────────────────

pht_dispatch_json_bytes = len(json.dumps(pht_train_request, indent=2).encode("utf-8"))
pht_container_bytes     = pht_train_request["algorithm"]["image_size_bytes"]
qis_packet_bytes        = len(json.dumps(qis_outcome_packet, indent=2).encode("utf-8"))

print(f"PHT dispatch JSON:      {pht_dispatch_json_bytes:>12,} bytes")
print(f"PHT container image:    {pht_container_bytes:>12,} bytes  ({pht_container_bytes / 1e6:.0f} MB)")
print(f"QIS outcome packet:     {qis_packet_bytes:>12,} bytes")
print(f"Container/packet ratio: {pht_container_bytes / qis_packet_bytes:>12,.0f}x")

# PHT dispatch JSON:              1,204 bytes
# PHT container image:      847,000,000 bytes  (847 MB)
# QIS outcome packet:               512 bytes
# Container/packet ratio:     1,654,297x

# Synthesis paths at N=50 OHDSI nodes:
N = 50
pht_result_points  = N                      # one result per station
qis_synthesis_paths = N * (N - 1) // 2     # every node synthesizes with every other
print(f"\nAt N={N} nodes:")
print(f"PHT result collection:   {pht_result_points} data points")
print(f"QIS synthesis paths:     {qis_synthesis_paths} paths  [N(N-1)/2]")

# At N=50 nodes:
# PHT result collection:    50 data points
# QIS synthesis paths:    1225 paths  [N(N-1)/2]
Enter fullscreen mode Exit fullscreen mode

Section 5: Side-by-Side Comparison

Dimension PHT QIS Protocol
Execution model Algorithm travels to data Outcome packet travels from data
Payload size 200MB–2GB container ~512 bytes
Synthesis type Linear result collection (N points) Quadratic compounding (N(N-1)/2 paths)
N=1 site inclusion k-anonymity threshold required (≥5–10) Single validated delta qualifies
Infrastructure requirement Container runtime + FAIR Data Station + FHIR endpoint Any system that can emit a JSON packet
Real-time synthesis Batch (train journey completes before results available) Continuous (packets route as observations occur)
Transport layer Defined by PHT station infrastructure Protocol-agnostic (DHT, database, API, pub/sub)
Scalability Linear with N stations Quadratic with N nodes
New station onboarding Starts from zero; no prior train history Queries accumulated prior packets immediately

Section 6: Complementary, Not Competing

PHT and QIS solve adjacent problems. They are not in competition for the same architectural role; they fill different gaps in the same infrastructure stack.

PHT is excellent for running specific analytical algorithms with full provenance tracking. A researcher who needs to execute a validated ATLAS analysis package across a federated OMOP network, with a full audit trail of which station ran what version of which algorithm, with GDPR-compliant data use agreements enforced at every hop — PHT is built for exactly that. It is a production system. It works.

What PHT does not do — by design — is create the continuous cross-institutional synthesis layer that accumulates intelligence between train journeys. When a train is not running, no intelligence is flowing. When a station has fewer than the k-anonymity threshold of qualifying observations, it is silent.

QIS fills both gaps. A PHT station running a train today can emit QIS outcome packets at the end of that execution — not as a replacement for the PHT result, but as an additional output that enters the continuous synthesis network. The two systems run in parallel. PHT handles rigorous scheduled analysis. QIS handles continuous ambient synthesis.

For the OHDSI community gathering in Rotterdam on April 18–20, 2026: the OMOP common data model is not an obstacle to QIS adoption — it is an accelerant. OMOP concept IDs are exactly the kind of standardized semantic vocabulary that QIS outcome packets use as fingerprints. An OMOP concept ID set describing a condition-drug-outcome triple maps directly to a QIS semantic address. The OHDSI network has already solved the hardest part of QIS adoption: agreeing on a shared semantic vocabulary across institutions. The routing infrastructure is the tractable engineering problem remaining.

The core insight from Christopher Thomas Trevethan's discovery is this: the question is whether an edge node can pull pre-distilled outcome packets from every similar node and synthesize locally. If yes, QIS works. The routing mechanism — whether DHT, a centrally indexed API, an OMOP-native query, or a pub/sub topic — does not change the quadratic scaling. The compounding is a property of the architecture. It does not depend on how packets physically travel; it depends on the fact that N nodes each emitting and receiving from N-1 peers creates N(N-1)/2 synthesis paths, and that number grows with the square of participation.

PHT demonstrated that European health informatics could agree on a privacy-preserving infrastructure and actually deploy it. That consensus is one of the harder things to achieve in this domain. QIS does not require rebuilding it. It requires adding a 512-byte output to systems that are already running.


QIS Protocol is a discovery by Christopher Thomas Trevethan. 39 provisional patents filed. The architecture specification is available for review by researchers and health informaticists.

Source: dev.to

arrow_back Back to Tutorials