Deleting the toolchain: linting a polyglot monorepo with one Rust binary

rust dev.to

A repository that mixes languages pays a tax most single-language repos never see. Python wants ruff, and ruff wants a Python. JavaScript and TypeScript want oxlint or prettier, and those want a Node. Shell wants shfmt and shellcheck. Then taplo for TOML, rumdl for Markdown, sqruff for SQL, hadolint for Dockerfiles, and one more for whatever you added last month. Each tool is installed separately, pinned separately, and wired into .pre-commit-config.yaml separately.

The result is slow setup, version drift, CI images fat with language runtimes you only need for linting, and a hook stack that breaks the day a system dependency goes missing or bumps a version. I hit this hard while building xberg, a document-intelligence engine with a Rust core and a dozen language SDKs in one repo, which is where the itch to fix it came from.

polylint is the fix I landed on. It ships two self-contained Rust binaries, poly lint and poly fmt, driven by one poly.toml. No language runtime is required on the default path. The goal was not to add one more tool beside the stack. It was to delete the stack.

What using it looks like

$poly fmt --check
would format crates/example/src/main.rs

$poly fmt --fix
formatted 1 file

$poly lint --format toon
path: crates/example/src/main.rs
diagnostics[0]: engine=ruff, code=F401, severity=warning, title="`os` imported but unused"

$poly hooks install
installed git hooks: pre-commit, commit-msg
Enter fullscreen mode Exit fullscreen mode

poly fmt is a dry run by default, which is what you want in CI. Add --fix to write changes. poly hooks install wires the git hooks once, and lint, format, and commit checks then run on every git commit.

How it avoids the runtimes

There are three tiers, and a file falls through them until something can handle it.

Curated Rust backends, in process. The Rust ecosystem has grown real, high-quality linters and formatters that are libraries, not just CLIs. polylint links them directly: oxc for JS/TS, ruff internals for Python, taplo for TOML, rumdl for Markdown, sqruff for SQL, malva and markup_fmt for CSS and HTML, mago for PHP. Every backend implements one trait and returns one diagnostic shape, so reporting, caching, and the MCP server stay uniform no matter which tool did the work.

pub trait Engine: Send + Sync {
    fn name(&self) -> &'static str;
    fn languages(&self) -> &[Language];
    fn capabilities(&self) -> Capabilities;   // lint / format / fix
    fn version(&self) -> &str;                 // folded into the cache key
    fn lint(&self, src: &SourceFile, cfg: &EngineConfig) -> Result<Vec<Diagnostic>>;
    fn format(&self, src: &SourceFile, cfg: &EngineConfig) -> Result<FormatOutput>;
}
Enter fullscreen mode Exit fullscreen mode

A tree-sitter fallback for everything else. For languages without a dedicated backend, polylint uses tree-sitter-language-pack, parses the source, and re-emits it with structural reindentation and whitespace normalization. This covers hundreds of grammars, including Java, using each language's indents.scm queries. So you get consistent indentation and clean whitespace on a Java file with no JDK on the machine. Grammars where whitespace carries meaning are left untouched on purpose, since a reindent there would corrupt the file rather than tidy it.

An opt-in catalog for the rest. Some tools have no Rust equivalent yet. For those, polylint vendors a tool catalog derived from the mdsf project, which maps a tool to its binary, argument vector, and languages. It is data only, so polylint does not depend on or shell out to mdsf itself. Each catalog tool is probed on PATH and skipped when absent, so enabling one never makes a run fail on a machine that lacks it.

Configuration for all of this lives in one file:

[defaults]
line_length = 120
line_ending = "lf"

[lint.python.ruff]
select = ["E", "F", "W"]

[per-file-ignores]
"tests/**" = ["F401"]

[hooks.builtin]
polylint = true
polyfmt = true
commit = { stages = ["commit-msg"] }
file_safety = true
Enter fullscreen mode Exit fullscreen mode

That [hooks] table replaces .pre-commit-config.yaml outright. polylint dogfoods it: the commit-msg hook runs Conventional Commit checks and strips AI-attribution trailers, and the file-safety builtin catches merge-conflict markers, large files, leaked private keys, and case collisions, all in pure Rust.

Why it is fast enough to run on every commit

Two things do the work. Nothing spawns a subprocess per file on the default path, so there is no process-startup cost multiplied across a large tree, and the whole run is a rayon parallel loop over files.

On top of that sits a blake3 result cache. The cache key is not just the file contents. It folds in the engine name, the engine's own version, and the resolved config, alongside the file bytes:

key = blake3(namespace, engine, engine_version, resolved_config, file_bytes)
Enter fullscreen mode Exit fullscreen mode

That detail matters more than it looks. A plain content hash would happily serve you a stale result after you upgraded a formatter or edited a rule, because the bytes did not change. Folding the tool version and config into the key means an upgrade or a config edit invalidates exactly the affected entries and nothing else.

In CI, the binary itself is cached by version, OS, and architecture, so a run downloads it once and reuses it until you bump the version.

name: Lint and Format
on: [push, pull_request]
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4      - uses: Goldziher/polylint@v1      - run: poly lint .
      - run: poly fmt --check .
Enter fullscreen mode Exit fullscreen mode

The tradeoffs

I would rather name these than have you find them.

The binary is large, around 70 MB per platform, because it bundles several big parsers and linters. For a tool you install once and cache in CI, I think that is the right trade against carrying zero system dependencies, but if you expected a 3 MB download, that is not this.

The coverage is uneven by design. Some backends lint only, some format only, so it is not true that every language gets every capability. The tree-sitter tier is a formatter, not a linter, and it deliberately does nothing on whitespace-sensitive grammars rather than risk a destructive rewrite.

And it is not fully subprocess-free in every case. rustfmt and gofmt are used when present, because reimplementing them in process is not worth it, and a few other native formatters are opt-in. The default path stays runtime-free; those are the documented exceptions.

Try it

polylint is MIT licensed.

pip install polylint
# or
brew install Goldziher/tap/polylint
Enter fullscreen mode Exit fullscreen mode

Source and docs are at github.com/Goldziher/polylint. If you run a polyglot repo and your pre-commit config has quietly become its own maintenance project, point poly lint at it and see what falls out.

Source: dev.to

arrow_back Back to Tutorials