Archaeologist — A Codebase Intelligence Toolkit Built on Tree-Sitter and Git History

python dev.to

Every codebase accumulates debt over time. Dead functions nobody calls. Files so complex they're impossible to change safely. Copy-paste code scattered across 6 different files. Dependencies so tangled that touching one file breaks three others.

I built Archaeologist to address all of this in one install.

pip3 install archaeologist

Here's how each of the 5 tools works.


⬡ Codebase Graph

archaeologist-graph ./my-project

Opens an interactive browser map of your architecture. Files rendered as cards, colored by health — green (clean), gold (has dead code), red (fully unused), purple (entry point). Grouped by folder using Dagre.js layout. Click any card to see every file that depends on it.


◎ Change Impact Analyzer

archaeologist-impact ./my-project --all --html

Before touching any file, see its blast radius. Builds a reverse call graph — every caller, every importer, test coverage — and produces a 0–100 risk score. The --all flag scans every file and ranks them by blast radius.


◈ Complexity Scorer

archaeologist-complexity ./my-project --html

Scores every function on cyclomatic complexity — decision branches, nesting depth, line count. Ranked worst-first. Labels: Very Complex (30+), Complex (15–29), Moderate (8–14), Simple (0–7).


⧉ Duplicate Detector

archaeologist-dupes ./my-project --html

Three signals: same name in different files, exact body match, near-identical bodies (85%+ similarity). Excludes common OOP names like toString/equals/hashCode to avoid false positives.


☠ Dead Code Finder

deadcode ./my-project --explain
deadcode-clean ./my-project --dry-run --min-confidence 85
deadcode-clean ./my-project --min-confidence 85

Static analysis alone has too many false positives. So I added git history as a second signal.

Each flagged function gets a 0–100 confidence score:

Signal Points How it works
Call graph 45 Zero inbound calls from non-test code
Git age 20 File untouched for 2+ years
Author count 15 Single author ever committed
Recursive dead 10 All callers are also flagged
Commit count 10 Only 1–2 total commits ever

The auto-delete pipeline: isolated git branch → AST byte-range deletion → tests run → PR opened if tests pass. Main branch never touched.


False positive rate

Tested on 44 open source projects — Flask, Django, FastAPI, Rails, Gin, Vue, Zod, Actix, JUnit5, Alamofire and more. Zero false positives at 80%+ confidence on all of them.


Try it

pip3 install archaeologist

Website: https://prathik-arun.github.io/archaeologist
GitHub: https://github.com/prathik-arun/archaeologist

Would love feedback on false positive rate on your own codebases.

Source: dev.to

arrow_back Back to Tutorials