I've been using GPT-5 and Claude via API for coding tasks — refactoring, code review, architecture questions, debugging. The bill was creeping past $150/month and I had no idea which calls were actually worth the money.
Provider dashboards show you totals. Tokens used, dollars spent, done. But they don't tell you which specific calls were unnecessary. Was that $2.80 request for "where is the auth middleware" really worth sending to GPT-4o?
So I built a tracker to find out.
The experiment
I wrote a small Python library called llm-costlog that wraps around any LLM API call and records:
- Tokens used (prompt + completion)
- Cost in USD (built-in pricing for 40+ models)
- Route — did this go to the API, or was it handled locally?
- Intent — what kind of request was this? (code lookup, architecture question, debugging, etc.)
Five lines to integrate:
from llm_cost_tracker import CostTracker
tracker = CostTracker("./costs.db")
tracker.record(
prompt_tokens=847,
completion_tokens=234,
model="gpt-4o-mini",
provider="openai",
intent="code_lookup",
)
After a week of tracking everything, I ran the waste analysis.
The results
Total cost: $0.2604
Avoidable: 23 of 35 external calls
Avoidable percent: 65.7%
Wasted: $0.0204
Model downgrade savings: $0.244776
**65% of my external API calls were for things that didn't need an LLM at all. Symbol lookups, config checks, "where is this function defined," file searches. Stuff that can be answered by searching the codebase directly.
This was from a small test run. The dollar amounts are tiny because the test used short prompts. But the ratio is what matters — at real-world usage with large contexts (2K-8K tokens per request, which is typical for code work), that 65% avoidable rate translates to serious money. If you're spending $150/month on LLM APIs and 65% of calls are avoidable, that's ~$100/month in waste.
What I did about it
Knowing the waste exists is step one. Fixing it automatically is step two.
So I built promptrouter — a gateway that sits between your code and the LLM API. For every prompt, it decides:
Can this be answered locally? Symbol lookups, config checks, file searches → handled instantly, $0 cost. It has an AST parser that builds a call graph of your codebase, so "what calls this function" is answered from the parse tree, not by asking an LLM.
Does this actually need an LLM? Architecture questions, code review, complex debugging → sent to the API, but with compacted context. Instead of sending the whole repo, it packs only the 3-5 most relevant files into a token budget you control.
The result: the calls that stay local cost nothing. The calls that go external use 40-80% fewer input tokens.
Watching the waste score drop
The tracker now has a waste_score_trend feature that shows your efficiency improving over time:
trend = tracker.waste_score_trend(days=30)
print(trend["summary"])
Apr 12 waste=75.0% (12/16 avoidable)
Apr 14 waste=66.7% (8/12 avoidable)
Apr 16 waste=50.0% (4/8 avoidable)
Apr 18 waste=20.0% (1/5 avoidable)
Direction: improving ↓
Watching that number drop from 75% to 20% over a week was the most satisfying part. Every prompt that gets rerouted locally is money that stays in your pocket.
The technical bits
For anyone curious about the internals:
- Routing: keyword classification + phrase detection. Not ML-based (yet), but 100% accurate on my test suite of 22 prompt types.
- Code search: BM25 text matching + optional semantic search (sentence-transformers, all-MiniLM-L6-v2). Blended scoring: 60% BM25 + 40% semantic similarity.
-
AST analysis: Full call graph and import dependency tracing for Python and TypeScript/JavaScript. Regex-based for TS/JS, stdlib
astmodule for Python. Zero external dependencies for either. - Git integration: Recent commits, blame, diffs as context — so "who changed this and when" doesn't burn tokens.
- Cost tracking: SQLite-backed ledger with real token counts from the provider's usage block, priced against a built-in table of 40+ models.
- LLM client: Speaks OpenAI, Anthropic, Ollama, and any OpenAI-compatible endpoint over plain HTTP. No SDK dependency.
Both tools are zero-dependency (stdlib only) for the core functionality. Embeddings and precise tokenization are optional extras.
Try it
Just want to see where your money goes?
pip install llm-costlog
GitHub: github.com/batish52/llm-cost-tracker
Want to fix the waste automatically?
pip install promptrouter
GitHub: github.com/batish52/codecontext
Both are MIT licensed. Feedback, issues, and stars welcome — these are my first open source releases and I'm iterating fast based on user feedback. A Reddit commenter asked for TypeScript support and a waste score trend feature — both shipped within 24 hours.