Four GitHub Actions patterns that schedule ETL across a three-site monorepo

typescript dev.to

Running three sites from one monorepo means three separate ETL jobs, three content rebuilds, and three deployment pipelines — all scheduled, all needing to avoid colliding with each other. After six weeks of iteration starting from launch on April 23, here are the four patterns that stabilized the scheduling.

Staggered cron offsets

The naive approach is putting all three ETL jobs on 0 2 * * *. They fire simultaneously, race for the same API rate-limit windows, and one of them reliably fails when HuggingFace or the GitHub API returns a 429 mid-run.

The fix is 30-minute offsets. GitHub Actions cron syntax follows the standard five-field POSIX format — minute, hour, day, month, weekday:

on:
  schedule:
    - cron: '02***'   # ai-tools ETL
    - cron: '302***'  # oss-alternatives ETL
    - cron: '03***'   # indie-games ETL
Enter fullscreen mode Exit fullscreen mode

Thirty minutes is enough to avoid competing for the same rate-limit window. I use 30 rather than 15 because the HuggingFace pull for the AI tools site can take up to 22 minutes when the model metadata hasn't been cached in Turso. A 15-minute offset would let the second job start before the first finishes, which triggers the same collision.

The Steam/itch.io ETL for indie games has its own rate throttle built into the ETL script — 100ms between requests — so it's less sensitive to start time. But I still offset it to the next hour to keep the logs readable without two jobs' output interleaved.

Skip flags in commit messages

Every ETL job ends with a content rebuild and a Vercel deployment trigger. But the article-publish workflow also triggers a rebuild (adding a file to content/articles/ causes Vercel to detect a push to main). Without coordination, every article commit causes all three sites to rebuild — wasted Vercel build minutes and unnecessary queue time.

I adopted a [skip publish-articles] convention on all ETL commits:

- name: Check skip flag
  id: skip
  run: |
    MSG=$(git log -1 --format='%s')
    if echo "$MSG" | grep -q '\[skip publish-articles\]'; then
      echo "skip=true" >> $GITHUB_OUTPUT
    fi

- name: Build and deploy
  if: steps.skip.outputs.skip != 'true'
  run: pnpm build
Enter fullscreen mode Exit fullscreen mode

ETL commits always include [skip publish-articles]. Article generation commits don't. This keeps the two pipeline types isolated without adding conditional logic to the triggers.

The convention is visible in the actual git history: chore(ai-tools): refresh content 2026-06-03 [skip publish-articles]. It's slightly ceremonial but explicit enough that I haven't accidentally broken it.

Path-filtered workflow triggers

Skip flags handle the rebuild problem. Path filters handle a different problem: preventing unrelated changes from triggering the wrong site's deploy.

Each site's deploy workflow watches only its own apps/ subtree plus the shared packages/ directory:

on:
  push:
    paths:
      - 'apps/ai-tools/**'
      - 'packages/shared/**'
    branches:
      - main
Enter fullscreen mode Exit fullscreen mode

When the OSS alternatives ETL updates files under apps/oss-alternatives/data/, only the OSS alternatives deploy fires. The AI tools deploy stays idle. This matters during ETL runs that touch large data files — without path filtering, a single ETL run would trigger three deploys when only one site's content changed.

One predictable gotcha: packages/shared/ changes trigger all three deploys, because shared code is genuinely shared. Updating the Turso DB client or the shared Tailwind config causes a triple rebuild. I've accepted this tradeoff — it's rare (happens maybe once a week) and always intentional. If I'm touching shared code I expect all three sites to redeploy.

Manual dispatch with a site selector

Cron handles the steady state. Manual dispatch handles everything else: re-running a failed ETL, forcing a content refresh after adding a new data source, or debugging why one site's pages look wrong without triggering the other two.

on:
  workflow_dispatch:
    inputs:
      site:
        description: 'Targetsite'
        required: true
        type: choice
        options:
          - ai-tools
          - oss-alternatives
          - indie-games
          - all
      dry_run:
        description: 'Dryrunonly(nowritetoDB)'
        type: boolean
        default: false
Enter fullscreen mode Exit fullscreen mode

The job uses ${{ inputs.site }} to select which ETL script runs. all loops through all three with a bash array. The dry_run flag runs the fetch and parse steps but skips the Turso upsert — useful for checking whether a new data source is returning the expected shape before committing it to the DB.

The GitHub Actions UI renders the choice input as a dropdown. That's a minor quality-of-life detail but it prevents typos when manually triggering at odd hours.

When patterns conflict

These four patterns work together but require some discipline to keep aligned.

The interaction most likely to cause confusion: path filters and skip flags can both prevent a deploy. If an ETL commit modifies files outside the path filter's scope and includes a skip flag, you get a double block — the deploy won't fire regardless. Usually correct, occasionally confusing if you expected a deploy and it didn't happen.

I keep a short markdown file at .github/WORKFLOWS.md that maps each workflow to its trigger type, skip convention, and path filter. Without it, adding a new workflow usually breaks one of the existing patterns. The doc is four bullet points and a table — not elaborate, just enough to check before editing a workflow file.

Three months in, the scheduling is stable enough that I don't think about it most days. The patterns aren't clever. They're just explicit enough that I can read any workflow file and reconstruct exactly when and why it will run.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

Source: dev.to

arrow_back Back to Tutorials