TL;DR — I built Watch Arbitrage Tracker (Apify Store, GitHub): a Crawlee + Camoufox actor that scrapes 6 luxury-watch marketplaces in parallel, computes the cross-platform median price for any Patek/Rolex/AP reference, and pings Telegram the moment a listing drops more than X% below market. Sub-$1/month for typical dealer usage. Doubles as an MCP server so Claude Desktop / Cursor / ChatGPT can query the live feed in plain English.
The interesting part isn't the build — it's the 5 bugs I had to debug in production after pushing public, and the cross-platform median math that turns "scraped data" into a real arbitrage signal.
The problem (real, validated, painful)
Pro watch dealers — the people who flip pre-owned Patek 5711, Rolex Daytona, AP Royal Oak — spend 3+ hours a day refreshing 6 dealer marketplaces looking for mispriced inventory. The job is mechanical: open Chrono24, search 10 reference numbers, compare prices, switch to WatchBox, repeat, switch to Bobs Watches, repeat...
There are existing tools (Watchcharts $79/mo, ChronoPulse $500/mo, Bezel Club) but they all have the same flaw: single-platform anchoring. They tell you the median price on Chrono24, not across the market. That's useless for arbitrage — the whole point is finding spreads between platforms.
The math that actually matters:
spread = cross_platform_median(refX) - listing_price(refX, platformY)
If a Submariner 124060 is listed at $11,900 on WatchBox but the true market median (computed across Chrono24 + WatchBox + Bobs) is $13,988 — that's a 28.2% spread. That's the alert worth waking a dealer at 3am for.
No tool I could find computes a TRUE cross-platform median. So I built one.
The stack
Standard Apify stack with one custom twist:
- Crawlee + Camoufox (stealthy Firefox fork) for anti-bot resilience. Chrono24 + Bobs Watches sit behind Cloudflare; Camoufox + Apify proxy rotation handles them reliably.
- TypeScript everywhere (strict mode, Node 24).
-
Per-platform crawler files (
src/crawlers/{chrono24,watchbox,bobs,...}.ts) — each ~100 lines, all conform to the sameListingshape so the aggregator doesn't care which platform a listing came from. -
Aggregator (
src/aggregator.ts) — groups listings by extracted sub-reference (more on this in Bug #5 below), computes a trimmed median, detects spreads. -
Alert dispatcher (
src/alerts.ts) — Telegram per-opportunity with 24h dedup. - Dual mode — same codebase runs as a batch crawler (scheduled cron) AND as an MCP server in Apify Standby mode, exposing 3 HTTP tools for AI agents.
Total: ~2000 LOC across 25 files. Repo: github.com/DataKazKN/watch-arbitrage-mcp.
The 5 bugs I had to fix LIVE in production
I shipped the actor as a paid public Pay-Per-Event Apify Actor after my last test run looked clean. Then I ran the actor with my own real Telegram bot token + 3 references the day after launch, and immediately found 5 bugs that would have made the actor look broken to first-time users.
Bug #1 — WatchBox redirected every search to a splash page (0 listings extracted)
The crawler URL was:
https://www.the1916company.com/search/pre-owned/?q=rolex+116500LN
In our 2026-05-04 verification, this returned a tile grid with 8 listings. Two days later: zero. Why?
Live DOM inspection via Playwright revealed: WatchBox now redirects any query containing a brand keyword (rolex, patek, audemars) to a brand-suggest splash page that has NO product tiles. The previous URL pattern broke silently.
The fix was tiny but only findable by going hands-on:
// BEFORE — included brand prefix → redirect to splash → 0 tiles
return `https://www.the1916company.com/search/pre-owned/?q=${encodeURIComponent(`${brand}${ref}`)}`;
// AFTER — bare ref → lands on real /search/?q= results page
return `https://www.the1916company.com/search/?q=${encodeURIComponent(ref)}`;
Lesson: never trust documented URL patterns past 30 days for sites you don't control. Schedule monthly DOM verification runs, even on stable platforms.
Bug #2 — Bobs Watches wrong search endpoint (0 products rendered)
Same pattern, different cause. Bobs Watches' homepage form:
<form action="/shop" method="get">
<input name="query" type="text">
</form>
The actual search endpoint is /shop?query=124060 — but I had been routing through their old /{brand}-{model}-{page}.html catalog URLs (which only covered top 7 collections AND inflated sample size beyond the user's exact ref).
// BEFORE — stale catalog URL routing
return `https://www.bobswatches.com/rolex-submariner-1.html`;
// AFTER — actual search endpoint
return `https://www.bobswatches.com/shop?query=${encodeURIComponent(ref)}`;
Bonus: Cloudflare's "Un instant..." interstitial takes ~8s to clear with Camoufox. The original 30s waitForSelector timeout was occasionally too tight; bumped to 45s. Fewer false-zero runs.
Bug #3 — A previous defensive filter was now stripping 100% of legitimate data
This one was sneaky. I had added a strict ref-matching filter in every crawler to defend against an earlier bug where WatchBox returned Calatrava listings tagged with the wrong reference. The filter was:
// "5711/1A-010" → normalized → "57111a010" → required substring in title+href
const refCore = refLower.replace(/[^\w]/g, '');
const haystack = `${title}${href}`.toLowerCase().replace(/[^\w]/g, '');
if (!haystack.includes(refCore)) continue;
Worked on chrono24 (which lists titles with full sub-variants like 5711/1A-010). Destructive on European Watch Co, where titles use base refs only (e.g. 5711/1A):
INFO europeanwatch: 312 raw cards pre-dedupe for ref="116500LN"
INFO europeanwatch: extracted 0 listings for ref="116500LN"
312 cards found, 0 extracted. The filter was working as designed but the design was wrong for brand-grid platforms. Fix:
// Match BASE prefix instead of full sub-variant.
// "5711/1A-010" → match "57111a"; aggregator's extractSubRef() then
// groups detected sub-variants for accurate median.
const baseMatch = refLower.replace(/[^\w]/g, '').match(/^(\d{4,6}[a-z]{0,3})/);
const basePrefix = baseMatch ? baseMatch[1] : refLower.replace(/[^\w]/g, '');
Lesson: defensive code added to fix bug N can cause bug N+M months later. Keep filters per-platform when the platforms have meaningfully different data shapes.
Bug #4 — Actor.call('apify/send-mail') fails silently for public actors
I had wired up email digests for users who didn't want Telegram. Worked perfectly when I tested as the developer. Failed for every public-actor user with:
ApifyApiError: Insufficient permissions for the Actor.
Make sure you're passing a correct API token and that it has the required permissions.
After research: Apify injects a sandboxed runtime token for public Actor runs. That token doesn't have actor:write scope, so Actor.call('apify/send-mail') returns 403. There's no warning at build time — the failure happens at runtime, per-user, silently.
Worse: the dispatcher was catching the error in a try/catch and reporting email_sent: true anyway. So users would think their emails were sent when they weren't.
I made two fixes:
-
Honest reporting — return a boolean from
sendEmailDigest()and propagate it upstream:
async function sendEmailDigest(...): Promise<boolean> {
try {
await Actor.call('apify/send-mail', {...});
return true;
} catch (err) {
log.warning(`Email send failed`, { err: String(err) });
return false;
}
}
- Drop email from the MVP — better to ship a smaller working feature set than a bigger one that lies. v0.2 will integrate Resend HTTP API directly (no actor-to-actor call needed).
Lesson: catch/log/return-true is the worst possible error handling pattern. If you can't recover, surface the failure.
Bug #5 — Sub-reference grouping (the one that actually mattered)
This isn't a "bug fixed in production" — it's the architectural decision that made the whole tool work.
For broad reference searches like Nautilus, the actor returns listings across multiple sub-models: 5711, 5810, 5990, 7118, 7011, 4700. All of these are technically "Nautilus", but their median prices differ by 5-10x:
- 5711/1A-010 (men's stainless steel): $130K
- 7118/1A (women's): $50K
- 4700/1 (vintage): $25K
Aggregating one median across all sub-models would produce a misleading $80K median that triggers false arbitrage alerts every time a women's 7118 is listed.
The fix: extract sub-references from each listing's title using brand-specific regex:
export function extractSubRef(title: string, brand: string): string | null {
if (brand === 'patek-philippe') {
// Patek: 5711/1A-010, 5990/1A, 5810G-001, 7118/1200R-010, 5167A
const m = title.match(/\b([56]\d{3}\/?\d{0,4}[A-Z]?[-\s]?\d{0,3})\b/);
if (m) return m[1].replace(/\s+/g, '').toUpperCase();
}
if (brand === 'rolex') {
// Rolex: 116500LN, 124060, 126710BLNR
// 5+ digits min to skip year matches (2024, 2026) in titles
const m6 = title.match(/\b(\d{6}[A-Z]{0,5})\b/);
if (m6) return m6[1].toUpperCase();
const m5 = title.match(/\b(\d{5}[A-Z]{0,5})\b/);
if (m5) return m5[1].toUpperCase();
}
if (brand === 'audemars-piguet') {
// AP: 15500ST, 67600ST.OO.1210ST.01, 26331ST.OO.1220ST.01
const m = title.match(/\b(\d{5}(?:ST|OR|BC|SP|CE)(?:\.[A-Z0-9.]+)?)\b/);
if (m) return m[1].toUpperCase();
}
return null;
}
Then group listings by extracted sub-ref, NOT by user search term. Listings without a detectable sub-ref are kept in the dataset but excluded from median computation. Median is now per-sub-model, which is the only price that's actually comparable cross-platform.
This single change eliminated ~80% of the false-positive arbitrage signals in earlier builds.
Cross-platform median: the actual value prop
Once the bugs were fixed and 4 platforms were delivering listings reliably, the core math could finally do its job. From a real cloud run on 2026-05-06:
Sub-ref: 124060 (Rolex Submariner No Date)
Listings: 14 total
Chrono24: 12 listings, range $11,872 – $15,729
WatchBox: 1 listing, $10,050
Bobs Watches: 2 listings, $14,995 + $14,995
Median: $13,988 (computed across all 14)
Spread alerts (>5% below median):
1. WatchBox $10,050 → 28.2% below median ✅ ALERT
2. Chrono24 $11,872 → 15.1% below median ✅ ALERT
3. Chrono24 $12,200 → 12.8% below median ✅ ALERT
The 28.2% WatchBox spread is what the dealer flips for $3,938 profit. That single alert pays for ~12 months of the actor's runtime cost.
MCP integration: query your arbitrage feed from Claude Desktop
After getting the batch crawler stable, I wired the same data into a Model Context Protocol server using Apify Standby mode. Same Docker image; environment variable metaOrigin === 'STANDBY' switches the entry point from runBatch() to a small Express server with three HTTP tools:
| Tool | Purpose |
|---|---|
get_arbitrage_snapshot |
Top N current arbitrage opportunities, optionally filtered by ref + min spread % |
get_market_stats |
Per-ref median, min, max, count across platforms |
get_listings_by_ref |
Raw listings for a ref, filterable by condition + box/papers, paginated |
Add this to your Claude Desktop config:
{"mcpServers":{"watch-arbitrage":{"url":"https://kazkn--watch-arbitrage-mcp.apify.actor/mcp?token=apify_api_YOUR_TOKEN","transport":"streamable-http"}}}
…and Claude can answer questions like:
- "Show me the biggest Patek Nautilus spreads from the last 24h."
- "What's the median price of a Rolex Daytona 116500LN this week?"
- "Find me listings for the AP Royal Oak 15500ST under $35K."
The MCP server reads the latest dataset populated by the batch crawler. Same compute, different access pattern. Costs the same as the batch alerts: $0.50 per arbitrage query (you only pay for the value-extracting view).
Pricing as a Pay-Per-Event Apify Actor
I went with PPE rather than per-runtime because dealers care about per-alert ROI, not per-CPU-second cost:
| Event | Charge | When |
|---|---|---|
actor-start |
$0.05 | Once per scheduled run |
reference-monitored |
$0.01 | Per ref scanned across all platforms |
apify-default-dataset-item |
$0.001 | Per listing scraped |
spread-alert-triggered |
$0.50 | Primary event — only when a real arbitrage opportunity is dispatched |
Typical dealer profile (15 refs, hourly schedule):
- Light usage (1-2 alerts/day): ~$15-30/month
- Heavy usage (10+ alerts/day, 24/7 monitoring): ~$300-500/month
Compare to ChronoPulse at $500/mo flat regardless of signal volume. PPE means you pay for value extracted, not compute consumed. And the spending limit is respected — every charge call honors ACTOR_MAX_TOTAL_CHARGE_USD so a runaway alert spike never blows past the cap.
What I learned
- Ship public, then debug in production. I had test coverage. I had verified DOMs. Bugs surfaced anyway. The only way to find them was to run the actor with real user inputs at production scale.
- Defensive code rots. Bug #3 was a defense added to fix Bug #1 four weeks earlier. Today's defensive filter is tomorrow's data destroyer.
-
try/catch/logis not error handling. It's burying the failure. Always propagate up to a layer that can act on the error. - Per-platform crawler logic > universal crawler abstraction. Different sites have different DOM shapes, different anti-bot postures, different title formats. Pretending they're the same loses signal.
- Sub-reference grouping is the difference between a useful median and a misleading one. Generic "average price" tools don't do this. It's the entire moat.
Try it
- Apify Store: apify.com/kazkn/watch-arbitrage-mcp — free Apify tier covers ~$5/mo of free credits, enough for daily monitoring of 10-15 refs.
- GitHub source: github.com/DataKazKN/watch-arbitrage-mcp — MIT licensed; PRs welcome.
- MCP integration for Claude Desktop / Cursor / ChatGPT: see the README "Use as MCP server" section.
Built by kazkn. If this approach (multi-source price arbitrage + Telegram alerts + MCP for AI agents) is useful in your domain, send me a note via Apify message — I'm building a portfolio of arbitrage actors across other verticals (sneakers, art, fine wine) using the same architecture.