How I shipped a cross-platform watch arbitrage tracker on Apify in 2 weeks (and the 5 production bugs that almost killed the launch)

TL;DR — I built Watch Arbitrage Tracker (Apify Store, GitHub): a Crawlee + Camoufox actor that scrapes 6 luxury-watch marketplaces in parallel, computes the cross-platform median price for any Patek/Rolex/AP reference, and pings Telegram the moment a listing drops more than X% below market. Sub-$1/month for typical dealer usage. Doubles as an MCP server so Claude Desktop / Cursor / ChatGPT can query the live feed in plain English.

The interesting part isn't the build — it's the 5 bugs I had to debug in production after pushing public, and the cross-platform median math that turns "scraped data" into a real arbitrage signal.

The problem (real, validated, painful)

Pro watch dealers — the people who flip pre-owned Patek 5711, Rolex Daytona, AP Royal Oak — spend 3+ hours a day refreshing 6 dealer marketplaces looking for mispriced inventory. The job is mechanical: open Chrono24, search 10 reference numbers, compare prices, switch to WatchBox, repeat, switch to Bobs Watches, repeat...

There are existing tools (Watchcharts $79/mo, ChronoPulse $500/mo, Bezel Club) but they all have the same flaw: single-platform anchoring. They tell you the median price on Chrono24, not across the market. That's useless for arbitrage — the whole point is finding spreads between platforms.

The math that actually matters:

spread = cross_platform_median(refX) - listing_price(refX, platformY)

If a Submariner 124060 is listed at $11,900 on WatchBox but the true market median (computed across Chrono24 + WatchBox + Bobs) is $13,988 — that's a 28.2% spread. That's the alert worth waking a dealer at 3am for.

No tool I could find computes a TRUE cross-platform median. So I built one.

The stack

Standard Apify stack with one custom twist:

Crawlee + Camoufox (stealthy Firefox fork) for anti-bot resilience. Chrono24 + Bobs Watches sit behind Cloudflare; Camoufox + Apify proxy rotation handles them reliably.
TypeScript everywhere (strict mode, Node 24).
Per-platform crawler files (src/crawlers/{chrono24,watchbox,bobs,...}.ts) — each ~100 lines, all conform to the same Listing shape so the aggregator doesn't care which platform a listing came from.
Aggregator (src/aggregator.ts) — groups listings by extracted sub-reference (more on this in Bug #5 below), computes a trimmed median, detects spreads.
Alert dispatcher (src/alerts.ts) — Telegram per-opportunity with 24h dedup.
Dual mode — same codebase runs as a batch crawler (scheduled cron) AND as an MCP server in Apify Standby mode, exposing 3 HTTP tools for AI agents.

Total: ~2000 LOC across 25 files. Repo: github.com/DataKazKN/watch-arbitrage-mcp.

The 5 bugs I had to fix LIVE in production

I shipped the actor as a paid public Pay-Per-Event Apify Actor after my last test run looked clean. Then I ran the actor with my own real Telegram bot token + 3 references the day after launch, and immediately found 5 bugs that would have made the actor look broken to first-time users.

Bug #1 — WatchBox redirected every search to a splash page (0 listings extracted)

The crawler URL was:

https://www.the1916company.com/search/pre-owned/?q=rolex+116500LN

In our 2026-05-04 verification, this returned a tile grid with 8 listings. Two days later: zero. Why?

Live DOM inspection via Playwright revealed: WatchBox now redirects any query containing a brand keyword (rolex, patek, audemars) to a brand-suggest splash page that has NO product tiles. The previous URL pattern broke silently.

The fix was tiny but only findable by going hands-on:

// BEFORE — included brand prefix → redirect to splash → 0 tiles
return `https://www.the1916company.com/search/pre-owned/?q=${encodeURIComponent(`${brand}${ref}`)}`;

// AFTER — bare ref → lands on real /search/?q= results page
return `https://www.the1916company.com/search/?q=${encodeURIComponent(ref)}`;

Lesson: never trust documented URL patterns past 30 days for sites you don't control. Schedule monthly DOM verification runs, even on stable platforms.

Bug #2 — Bobs Watches wrong search endpoint (0 products rendered)

Same pattern, different cause. Bobs Watches' homepage form:

<form action="/shop" method="get">
  <input name="query" type="text">
</form>

The actual search endpoint is /shop?query=124060 — but I had been routing through their old /{brand}-{model}-{page}.html catalog URLs (which only covered top 7 collections AND inflated sample size beyond the user's exact ref).

// BEFORE — stale catalog URL routing
return `https://www.bobswatches.com/rolex-submariner-1.html`;

// AFTER — actual search endpoint
return `https://www.bobswatches.com/shop?query=${encodeURIComponent(ref)}`;

Bonus: Cloudflare's "Un instant..." interstitial takes ~8s to clear with Camoufox. The original 30s waitForSelector timeout was occasionally too tight; bumped to 45s. Fewer false-zero runs.

Bug #3 — A previous defensive filter was now stripping 100% of legitimate data

This one was sneaky. I had added a strict ref-matching filter in every crawler to defend against an earlier bug where WatchBox returned Calatrava listings tagged with the wrong reference. The filter was:

// "5711/1A-010" → normalized → "57111a010" → required substring in title+href
const refCore = refLower.replace(/[^\w]/g, '');
const haystack = `${title}${href}`.toLowerCase().replace(/[^\w]/g, '');
if (!haystack.includes(refCore)) continue;

Worked on chrono24 (which lists titles with full sub-variants like 5711/1A-010). Destructive on European Watch Co, where titles use base refs only (e.g. 5711/1A):

INFO  europeanwatch: 312 raw cards pre-dedupe for ref="116500LN"
INFO  europeanwatch: extracted 0 listings for ref="116500LN"

312 cards found, 0 extracted. The filter was working as designed but the design was wrong for brand-grid platforms. Fix:

// Match BASE prefix instead of full sub-variant.
// "5711/1A-010" → match "57111a"; aggregator's extractSubRef() then
// groups detected sub-variants for accurate median.
const baseMatch = refLower.replace(/[^\w]/g, '').match(/^(\d{4,6}[a-z]{0,3})/);
const basePrefix = baseMatch ? baseMatch[1] : refLower.replace(/[^\w]/g, '');

Lesson: defensive code added to fix bug N can cause bug N+M months later. Keep filters per-platform when the platforms have meaningfully different data shapes.

Bug #4 — `Actor.call('apify/send-mail')` fails silently for public actors

I had wired up email digests for users who didn't want Telegram. Worked perfectly when I tested as the developer. Failed for every public-actor user with:

ApifyApiError: Insufficient permissions for the Actor.
Make sure you're passing a correct API token and that it has the required permissions.

After research: Apify injects a sandboxed runtime token for public Actor runs. That token doesn't have actor:write scope, so Actor.call('apify/send-mail') returns 403. There's no warning at build time — the failure happens at runtime, per-user, silently.

Worse: the dispatcher was catching the error in a try/catch and reporting email_sent: true anyway. So users would think their emails were sent when they weren't.

I made two fixes:

Honest reporting — return a boolean from sendEmailDigest() and propagate it upstream:

   async function sendEmailDigest(...): Promise<boolean> {
     try {
       await Actor.call('apify/send-mail', {...});
       return true;
     } catch (err) {
       log.warning(`Email send failed`, { err: String(err) });
       return false;
     }
   }

Drop email from the MVP — better to ship a smaller working feature set than a bigger one that lies. v0.2 will integrate Resend HTTP API directly (no actor-to-actor call needed).

Lesson: catch/log/return-true is the worst possible error handling pattern. If you can't recover, surface the failure.

Bug #5 — Sub-reference grouping (the one that actually mattered)

This isn't a "bug fixed in production" — it's the architectural decision that made the whole tool work.

For broad reference searches like Nautilus, the actor returns listings across multiple sub-models: 5711, 5810, 5990, 7118, 7011, 4700. All of these are technically "Nautilus", but their median prices differ by 5-10x:

5711/1A-010 (men's stainless steel): $130K
7118/1A (women's): $50K
4700/1 (vintage): $25K

Aggregating one median across all sub-models would produce a misleading $80K median that triggers false arbitrage alerts every time a women's 7118 is listed.

The fix: extract sub-references from each listing's title using brand-specific regex:

export function extractSubRef(title: string, brand: string): string | null {
  if (brand === 'patek-philippe') {
    // Patek: 5711/1A-010, 5990/1A, 5810G-001, 7118/1200R-010, 5167A
    const m = title.match(/\b([56]\d{3}\/?\d{0,4}[A-Z]?[-\s]?\d{0,3})\b/);
    if (m) return m[1].replace(/\s+/g, '').toUpperCase();
  }
  if (brand === 'rolex') {
    // Rolex: 116500LN, 124060, 126710BLNR
    // 5+ digits min to skip year matches (2024, 2026) in titles
    const m6 = title.match(/\b(\d{6}[A-Z]{0,5})\b/);
    if (m6) return m6[1].toUpperCase();
    const m5 = title.match(/\b(\d{5}[A-Z]{0,5})\b/);
    if (m5) return m5[1].toUpperCase();
  }
  if (brand === 'audemars-piguet') {
    // AP: 15500ST, 67600ST.OO.1210ST.01, 26331ST.OO.1220ST.01
    const m = title.match(/\b(\d{5}(?:ST|OR|BC|SP|CE)(?:\.[A-Z0-9.]+)?)\b/);
    if (m) return m[1].toUpperCase();
  }
  return null;
}

Then group listings by extracted sub-ref, NOT by user search term. Listings without a detectable sub-ref are kept in the dataset but excluded from median computation. Median is now per-sub-model, which is the only price that's actually comparable cross-platform.

This single change eliminated ~80% of the false-positive arbitrage signals in earlier builds.

Cross-platform median: the actual value prop

Once the bugs were fixed and 4 platforms were delivering listings reliably, the core math could finally do its job. From a real cloud run on 2026-05-06:

Sub-ref:    124060 (Rolex Submariner No Date)
Listings:   14 total
  Chrono24:        12 listings, range $11,872 – $15,729
  WatchBox:         1 listing,  $10,050
  Bobs Watches:     2 listings, $14,995 + $14,995
Median:     $13,988 (computed across all 14)

Spread alerts (>5% below median):
  1. WatchBox  $10,050 → 28.2% below median ✅ ALERT
  2. Chrono24  $11,872 → 15.1% below median ✅ ALERT
  3. Chrono24  $12,200 → 12.8% below median ✅ ALERT

The 28.2% WatchBox spread is what the dealer flips for $3,938 profit. That single alert pays for ~12 months of the actor's runtime cost.

MCP integration: query your arbitrage feed from Claude Desktop

After getting the batch crawler stable, I wired the same data into a Model Context Protocol server using Apify Standby mode. Same Docker image; environment variable metaOrigin === 'STANDBY' switches the entry point from runBatch() to a small Express server with three HTTP tools:

Tool	Purpose
`get_arbitrage_snapshot`	Top N current arbitrage opportunities, optionally filtered by ref + min spread %
`get_market_stats`	Per-ref median, min, max, count across platforms
`get_listings_by_ref`	Raw listings for a ref, filterable by condition + box/papers, paginated

Add this to your Claude Desktop config:

{"mcpServers":{"watch-arbitrage":{"url":"https://kazkn--watch-arbitrage-mcp.apify.actor/mcp?token=apify_api_YOUR_TOKEN","transport":"streamable-http"}}}

…and Claude can answer questions like:

"Show me the biggest Patek Nautilus spreads from the last 24h."
"What's the median price of a Rolex Daytona 116500LN this week?"
"Find me listings for the AP Royal Oak 15500ST under $35K."

The MCP server reads the latest dataset populated by the batch crawler. Same compute, different access pattern. Costs the same as the batch alerts: $0.50 per arbitrage query (you only pay for the value-extracting view).

Pricing as a Pay-Per-Event Apify Actor

I went with PPE rather than per-runtime because dealers care about per-alert ROI, not per-CPU-second cost:

Event	Charge	When
`actor-start`	$0.05	Once per scheduled run
`reference-monitored`	$0.01	Per ref scanned across all platforms
`apify-default-dataset-item`	$0.001	Per listing scraped
`spread-alert-triggered`	$0.50	Primary event — only when a real arbitrage opportunity is dispatched

Typical dealer profile (15 refs, hourly schedule):

Light usage (1-2 alerts/day): ~$15-30/month
Heavy usage (10+ alerts/day, 24/7 monitoring): ~$300-500/month

Compare to ChronoPulse at $500/mo flat regardless of signal volume. PPE means you pay for value extracted, not compute consumed. And the spending limit is respected — every charge call honors ACTOR_MAX_TOTAL_CHARGE_USD so a runaway alert spike never blows past the cap.

What I learned

Ship public, then debug in production. I had test coverage. I had verified DOMs. Bugs surfaced anyway. The only way to find them was to run the actor with real user inputs at production scale.
Defensive code rots. Bug #3 was a defense added to fix Bug #1 four weeks earlier. Today's defensive filter is tomorrow's data destroyer.
try/catch/log is not error handling. It's burying the failure. Always propagate up to a layer that can act on the error.
Per-platform crawler logic > universal crawler abstraction. Different sites have different DOM shapes, different anti-bot postures, different title formats. Pretending they're the same loses signal.
Sub-reference grouping is the difference between a useful median and a misleading one. Generic "average price" tools don't do this. It's the entire moat.

Try it

Apify Store: apify.com/kazkn/watch-arbitrage-mcp — free Apify tier covers ~$5/mo of free credits, enough for daily monitoring of 10-15 refs.
GitHub source: github.com/DataKazKN/watch-arbitrage-mcp — MIT licensed; PRs welcome.
MCP integration for Claude Desktop / Cursor / ChatGPT: see the README "Use as MCP server" section.

Built by kazkn. If this approach (multi-source price arbitrage + Telegram alerts + MCP for AI agents) is useful in your domain, send me a note via Apify message — I'm building a portfolio of arbitrage actors across other verticals (sneakers, art, fine wine) using the same architecture.