Building a Pet Insurance Comparison Engine: Handling Variable Premiums Across 15 French Insurers

French pet insurance has grown 34% since 2022, driven by rising vet costs and increased pet ownership post-COVID. But comparing products programmatically is a nightmare: 15 major insurers, each with their own pricing grid based on species, breed, age, region, and deductible. Here is how I built a comparison engine that handles this complexity.

The Data Model Challenge

Each insurer publishes premiums differently:

Santévet : JSON API (unofficial, scraped from their quote widget)
Assurimo : PDF tariff grids updated quarterly
Groupama : Static tables by risk category
Dalma : Dynamic pricing engine (quote request required)

The core problem: how do you normalize wildly different data structures into a comparable output?

A Flexible Pricing Schema

{"insurer_id":"santevet","product_id":"sv-excellence","species":"cat","breed_risk_group":2,"age_min_months":12,"age_max_months":84,"region":"IDF","deductible_pct":20,"ceiling_annual_eur":3000,"monthly_premium_eur":38.50,"reimbursement_basis":"actual_costs","waiting_period_days":30}

This schema handles ~80% of cases. The remaining 20% (hereditary conditions, breed exclusions, complementary modules) use an exclusions array and add_ons object.

Breed Risk Classification

The biggest normalization challenge is breed-to-risk mapping. French insurers use different classification systems:

Santévet: 4 risk groups (1=low, 4=high)
Assurimo: 7 categories by morphology
Allianz: breed whitelist / blacklist

I built a crosswalk table mapping 380 dog breeds and 90 cat breeds to a normalized 5-level risk scale using FCI (Fédération Cynologique Internationale) breed standards as the anchor.

Regional Pricing

Vet costs vary significantly by region: a consultation costs €28 in rural Creuse vs €68 in Paris 16th. Some insurers adjust premiums by department, others by zip code prefix, others by urban/rural flag only.

Solution: a geolocation lookup table mapping INSEE commune codes to risk tiers, updated annually from the DREES veterinary care cost survey.

Real-Time Quote Aggregation

For insurers with quote APIs, I use a queue-based system: user input triggers parallel quote requests across all insurers, with a 3-second timeout. Missing quotes fall back to cached tariff data (max 30 days old), flagged visually in the UI.

The result is a side-by-side comparison that actually reflects real prices. You can test the live engine at monassuranceanimal.fr, which covers 12 insurers with real-time quotes and 3 with cached grids.

Handling Annual Premium Updates

French law requires insurers to notify policyholders of premium changes 15 days before renewal. For comparison sites, this creates a "freshness" problem: prices quoted in November may differ by January.

My solution: a confidence score per premium record, calculated as 1 - (days_since_update / 90). Records older than 90 days are excluded from comparisons and flagged for manual refresh.

What I Would Do Differently

Start with the PDF parser first - most insurers still distribute tariffs as PDFs, and building a reliable extractor took 3x longer than expected
Document the exclusions schema early - adding hereditary conditions support retroactively broke 4 normalizers
Build the insurer change detection webhook - instead of polling, subscribe to insurer sitemap changes

Have you built comparison engines in regulated industries? The insurance sector has unique challenges around accuracy obligations (ACPR regulations) that add compliance overhead. Happy to discuss in the comments.