From N*M to N+M: A Zero-Dependency LLM Provider Layer

typescript dev.to

There are only 3 LLM API protocols, but unlimited providers running the same protocol. Separate protocol from identity — protocol is code, provider is data — and complexity drops from N×M to N+M. 300 lines of TypeScript. Zero dependencies.


The problem isn't "it doesn't work." It's "it won't tell you it broke."

In May, I built a Claude Code skill called unblind. I use DeepSeek as my daily driver, but it can't see images. So unblind forwards images to Mimo and OpenAI's vision APIs.

The MVP had two providers. A few dozen lines of if-else. It worked.

Then I noticed something more unsettling: an expired API key — no warning. A network hiccup — no retry. A missing permission — silently skipped. This tool didn't fail. It quietly stopped working without telling you.

I added Phase 0 self-healing, circuit breakers, persistent caching, and a security sandbox. Now unblind wouldn't fail silently.

But then I noticed something else. The circuit breaker doesn't care if you're calling a vision API or a translation API. The cache doesn't care if the response is an image description or OCR text. The error normalization doesn't care whether the other end is Mimo or OpenAI.

A universal provider infrastructure, trapped inside a vision skill.


First attempt: follow the ecosystem, hit the ceiling

The largest similar project in the ecosystem is vision-support, with 19 providers. The pattern is standard—base class + subclasses, GoF Template Method. I followed it for v2.0.

class BaseProvider {
  async analyzeImage({ image, prompt, options }) {
    const { url, body, headers } = this._buildRequest(image, prompt, options);
    const res = await apiRequest(url, { body, headers });
    return { content: await this._parseResponse(res), model: this._model };
  }
}

class MimoProvider extends BaseProvider { ... }     // 54 lines
class OpenAIProvider extends BaseProvider { ... }    // 45 lines
class GeminiProvider extends BaseProvider { ... }   // ~50 lines
Enter fullscreen mode Exit fullscreen mode

One subclass per provider. I expanded unblind from 3 to 7 — adding Groq, Together, Fireworks, and Ollama.

Then Groq, Together, and Fireworks tripped me up. They use the exact same Chat Completions API as OpenAI. The only difference from OpenAIProvider is the baseUrl and model name.

Write three new classes? Almost entirely duplicated. Reuse OpenAIProvider? The differences are buried in build functions — you can't read the code and see that Groq and OpenAI are the same protocol.

And later, I discovered Mimo supports both the OpenAI and Anthropic protocols. Same vendor, two protocol endpoints. In the subclass approach, that's double the complexity.


Stop and think

I stuffed two things that don't belong together into one class.

Concept What it is How often it changes Should be
Protocol "The format of the request" — Anthropic Messages / OpenAI Chat / Google Gen AI Extremely rare (3 years to converge on 3) Code
Provider "Where to connect" — which vendor, which key, which baseUrl Adds constantly Data

Five providers already use a single OpenAI Chat Completions protocol — OpenAI, Groq, Together, Fireworks, Ollama. Anthropic Messages only has Mimo today, but some proxy gateway could wrap GPT-4o in Anthropic format tomorrow. Google's protocol only has Gemini, but that won't last.

One vendor × one protocol = one registry row. Two protocols = two rows. Five vendors = five rows. In the subclass approach, that's N×M class explosion. In the protocol approach, it's N data rows + M protocol objects.

This insight comes from 30 years of database experience. MySQL and PostgreSQL need different SQL dialects, but 100 MySQL instances are just different connection strings. Groq is a MySQL instance. OpenAI is another. I was writing a Dialect for every instance.


Protocol is code. Provider is data.

// Protocol — code. The whole project has 3 protocol objects.
const PROTOCOLS = {
  'openai-chat-completions': {
    endpoint: '/chat/completions',
    auth: (key) => ({ Authorization: `Bearer ${key}` }),
    buildContent: (inputs, prompt) => {
      const content = [];
      for (const inp of inputs) {
        if (inp.type === 'image') content.push({ type: 'image_url', image_url: { url: inp.data } });
        if (inp.type === 'text')  content.push({ type: 'text', text: inp.data });
      }
      content.push({ type: 'text', text: prompt });
      return content;
    },
    buildBody: (model, content, opts) => ({
      model, max_tokens: opts.maxTokens || 2048, messages: [{ role: 'user', content }]
    }),
    extractContent: (data) => data.choices?.[0]?.message?.content,
    parseError: (data, status) => { /* → 4 categories */ },
  },
  'anthropic-messages': { /* 6 functions, each different */ },
  'google-generative-ai': { /* 6 functions, each different */ },
};

// Provider — pure data. Adding one = adding one line. Model is a field value, not a registry entry.
const REGISTRY = [
  { name: 'openai',   protocol: 'openai-chat-completions', baseUrl: 'https://api.openai.com/v1',         model: 'gpt-4o' },
  { name: 'groq',     protocol: 'openai-chat-completions', baseUrl: 'https://api.groq.com/openai/v1',     model: 'llama-4-vision' },
  { name: 'mimo',     protocol: 'anthropic-messages',      baseUrl: 'https://api.xiaomimimo.com/anthropic', model: 'mimo-v2.5' },
  { name: 'gemini',   protocol: 'google-generative-ai',    baseUrl: 'https://generativelanguage.googleapis.com/v1beta', model: 'gemini-2.5-flash' },
];

// Mimo supports both Anthropic AND OpenAI? Add one row. No new code.
// { name: 'mimo-openai', protocol: 'openai-chat-completions', baseUrl: 'https://api.xiaomimimo.com/v1', model: 'mimo-v2-omni' }

// GenericProvider — the only class. Zero subclasses.
const provider = new GenericProvider({
  name: 'openai',
  protocol: PROTOCOLS['openai-chat-completions'],
  baseUrl: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY!,
  model: 'gpt-4o',
});

await provider.execute({ inputs, prompt });
Enter fullscreen mode Exit fullscreen mode
v2.0 Template Method v3.0 Protocol-Driven
Complexity N × M (N subclasses × M build functions) N + M (N data rows + M protocol objects)
Number of classes 7 1
Add OpenAI-compatible provider Write a build function Add 1 registry row
Add new protocol family Write a Provider class PROTOCOLS +1, Registry +1
Same vendor, dual protocol New subclass Add 1 data row
Lines of code ~347 ~290 (-16%)
Protocol logic testable ❌ Needs API key Zero-dependency pure functions

Four key design decisions

1. Pure functions, not classes — for testability

interface Protocol {
  readonly endpoint: (model: string) => string;
  readonly auth: (apiKey: string) => Record<string, string>;
  readonly buildContent: (inputs: Input[], prompt: string) => unknown[];
  readonly buildBody: (model: string, content: unknown, opts: ExecuteOptions) => unknown;
  readonly extractContent: (data: Record<string, unknown>) => string;
  readonly parseError: (data: Record<string, unknown>, status: number) => ParsedError;
}
Enter fullscreen mode Exit fullscreen mode

No extends. No abstract. A protocol is just a plain object with 6 pure functions.

// Zero-dependency unit test — no mock fetch, no Provider instance needed
it('extractContent — OpenAI response', () => {
  assert.equal(
    PROTOCOLS['openai-chat-completions'].extractContent({
      choices: [{ message: { content: 'A cat on a table' } }]
    }),
    'A cat on a table'
  );
});
Enter fullscreen mode Exit fullscreen mode

The subclass approach can't do this — you need a complete instance with a key, URL, and timeout. Under the protocol approach, 80% of tests have zero dependencies.

2. Overrides, not subclasses — differences are data

Groq caps max_tokens at 4096. The OpenAI protocol has no such limit. The difference is too small to justify a class:

{ name: 'groq', protocol: 'openai-chat-completions',
  overrides: {
    buildBody(proto, model, content, opts) {
      const body = proto.buildBody(model, content, opts);
      body.max_tokens = Math.min(body.max_tokens, 4096);  return body;
    },
  },
}
Enter fullscreen mode Exit fullscreen mode

Inspired by Kubernetes strategic merge patch — defaults in spec, overrides in patch.

3. Three error formats → four categories

Anthropic returns { type: "error", error: { type: "invalid_request_error" } }. OpenAI returns { error: { type: "..." } }. Gemini returns { error: { code: 400, status: "INVALID_ARGUMENT" } }.

Completely incompatible. Each protocol provides parseError(data, status), normalized to a single type:

type ParsedError =
  | { category: "auth" }       // Don't retry
  | { category: "rate_limit" }  // Back off
  | { category: "server" }      // Fall through to next
  | { category: "client" };     // Report
Enter fullscreen mode Exit fullscreen mode

The upstream circuit breaker only reads category — it never needs to know which API is on the other end.

4. TypeScript turns runtime errors into compile errors

export type Input = TextInput | ImageInput | AudioInput | DocumentInput;

export interface ImageInput {
  readonly type: "image"; readonly data: string; readonly mimeType: string;
}
export interface TextInput {
  readonly type: "text";  readonly data: string;
}
Enter fullscreen mode Exit fullscreen mode

Before, mimeType was an optional string — forget it, crash at runtime. Now, type determines which fields are required — forget it, fail to compile.


N×M → N+M: quantified

Protocol-driven doesn't optimize "switching models" — v2.0 already did that with zero code. What it eliminates is the coupling between Provider and Protocol:

Operation v2.0 v3.0
Switch models Change a field (zero code) Change a field (zero code)
Add same-protocol vendor (Groq) Write a build function Add 1 data row
Add new protocol family (Google) Write a Provider subclass Add 1 protocol object
Same vendor, dual protocol (Mimo → Anthropic + OpenAI) New subclass + build function Add 1 data row

What's truly eliminated is cross-dimensional coupling. Protocol evolution doesn't touch Provider definitions. Provider additions don't touch Protocol code. N and M finally evolve independently.


Shipping it

After extracting the provider layer, unblind's code shrank 16%. The extracted package — zeshim — can be reused by any agent tool:

npm install zeshim
Enter fullscreen mode Exit fullscreen mode

Full design docs: unblind/display/provider-optimization.md.

But extracting the provider layer forced me to define zeshim's four principles. At the time they felt obvious. Looking back, they're what keeps the architecture stable.


Appendix: four principles I only understood after extraction

1. Zero dependencies is a category definition, not a preference

I surveyed 52 frameworks and coding agents. Not a single one had a zero-dependency multi-provider abstraction. Zero dependencies IS the category. Add the first dependency, and zeshim's difference from LangChain degrades from "different architecture" to "less code" — and less code is not a moat.

The cost: no tiktoken for token counting, no Zod for validation, no undici for HTTP optimization. All pushed to user-space.

2. The Protocol interface will never exceed 8 functions

LangChain's BaseChatModel bloated from 3 methods to a 15-layer call stack. Vercel AI SDK's LanguageModelV1 demands 10+ methods. zeshim has 6 today. Add stream? (+1), reserve countTokens? (+1), hard cap at 8.

Need a 9th capability? That's not a new function — that's a new Protocol family.

3. Core ≤ 500 LOC

300 LOC → 462 LOC → ... without a cap, "the core won't bloat" is empty words. 500 LOC means anyone can read the entire codebase in 30 minutes. Exceed it, and you split into an independent package.

4. The Scheduler does not live in Core

Provider health is exposed as status(): "healthy"|"degraded"|"down". Scheduling logic — latency tracking, cost accumulation, dynamic routing — lives in a separate package that doesn't import Protocol and knows nothing about API internals. The Provider doesn't know the Scheduler exists. The Scheduler doesn't know the Protocol details.


Should LLM API provider layers live on the client or the server? Would love to hear your take.

Source: dev.to

arrow_back Back to Tutorials