Why I Reversed My Own Architecture After 27 AI Luminaries Reviewed It

TL;DR — I built a personal knowledge system where the act of reading continuously reshapes the tools you read with. Six agents on Claude Code, MCP, Neo4j, $0/day runtime. Today I simulated 27 software luminaries reviewing it, shipped four response packs, and reversed my own repository-strategy ADR from two weeks ago. This post is the honest tour.

The frustration that started it

Most mornings I read arXiv. By Friday, I can't remember what Tuesday's paper argued. I have Notion pages, highlighted PDFs, bookmarked threads — and yet when someone asks me "so, what did you learn this month?", I hesitate.

The ritual scales. The accumulation doesn't.

comad-world began at that asymmetry. What if reading a paper mutated the system I use to read the next paper? Not as a prompt I remember to invoke, but as a trajectory the graph silently integrates every day.

That one idea is the whole product.

ear (listen) → brain (think) → eye (predict)
                  ↑
photo (edit)    sleep (remember)    voice (automate)

Six agents, one config file (comad.config.yaml). Swap the config and the whole system reconfigures for ai-ml or finance or biotech. I shipped v0.2.0 two weeks ago with 15 stars on HN and a decent 1,336-test CI.

It was fine. It needed a mirror.

The 27-angle luminary review

I couldn't afford to wait for real user feedback to catch structural problems — at 15 stars, the feedback loop is too thin. So I simulated reviewers. Not cosplay; disciplined role-prompting across 27 distinct angles, each with its own decision rubric.

Here's the short version of how they scored it:

Angle	Score	What they flagged
Ousterhout (design philosophy)	9/10	Deep modules, shallow interfaces — ADRs pay off.
Stallman (freedom, local-first)	9/10	Local Ollama, Claude Max OAuth, no telemetry.
Karpathy (simplicity)	8.5/10	6 modules + 4 MCP + 2 Neo4j is a lot for a solo project.
Kleppmann (reliability)	8/10	Trust boundaries clear, but no graph backup drill.
Schneier (security)	8/10	28 MCP tools = attack surface; threat model missing.
LeCun (world models)	7.5/10	"Prediction accuracy" tracked but not calibrated.
Norman (first-run UX)	7/10	`./install.sh` then... now what?
Dijkstra (rigor)	7.5/10	1,336 tests, but mostly example-based.
Popper (falsifiability)	6.5/10	How do wrong predictions decay a lens?
O'Neil (algorithmic bias)	6/10	31 RSS feeds = big-tech echo chamber risk.
Pearl (causal inference)	6.5/10	Graph edges are associative; where's "intervention"?
Moore (crossing the chasm)	6.5/10	Beachhead too wide. Pick one persona.
Harari (narrative)	6/10	Engineering 9, storytelling 6. README leads with features.
Thiel (moat)	6/10	What's the unreplicable secret, honestly?
Wilson (ecosystems)	6.5/10	Pipeline is a food chain, not mutualism.
Jepsen (chaos)	6/10	No partial-failure playbook.
...

Average: 7.7/10. Strong in the bones, weak in the places that decide whether it grows.

The real insight wasn't any single score. It was the pattern:

Engineering maturity 9/10. Narrative maturity 6/10. Observability maturity 6-7/10. Epistemic hygiene 6-7/10.

A system that's measured by its code looks healthy. A system that's measured by whether it would survive a bias audit, a chaos day, or a new visitor's first 90 seconds looks much less healthy.

The four response packs

I grouped the gaps into four independent packs and shipped them in v0.3.0. Small enough that each is a single commit; independent enough to parallelize.

Pack A — Narrative (Harari · Moore · Thiel · McLuhan)

The README's old hero: "A self-evolving personal knowledge system — what you read automatically improves your tools."

Features. Nouns. Forgettable.

New hero:

You read arXiv every morning. By Friday, you can't remember what Tuesday's paper argued.
Comad World turns each paper you read into a graph edge, a sharpened retrieval lens, a calibrated prediction.
Your reading stops evaporating — it compounds into a system that thinks alongside you.

Then two new documents:

STORY.md — origin, why six modules, two real failure stories (the Neo4j single-instance collapse that pushed p95 from 20.7s to 13.8s only after I split into two instances; the 17K-line cleanup that had to come before v0.2.0 because over-design had been accumulating silently).
docs/moat.md — a Thiel-shaped answer. The moat isn't any single thing. It's a multiplicative combination: self-evolving loop × Claude Max $0/day × local-first. With one axis missing the moat collapses. With all three, time widens the gap.

Pack B — Observability (Jepsen · Dean · Vogels)

Upgrade/rollback/lock DX was 8/10. SLO/SLI was 0/10 because it didn't exist.

docs/slo.md: three SLIs (brain query p95 latency target ≤15s; crawl success rate ≥95%/24h; MCP server uptime ≥99%/month).
docs/chaos-drill.md: partial-failure playbook for brain, eye, and each Neo4j instance. Predicted behavior, manual reproduction, recovery commands, verification.
brain/docs/query-plan.md: how to capture Neo4j Cypher EXPLAIN/PROFILE output, with one worked example.
scripts/comad status now prints SLI summary — stubbed at first, then genuinely wired two commits later when I added p95 sample tracking to brain/packages/core/src/perf.ts:

// perf.ts — sample ring buffer (cap 1000) → p95 calculation
const MAX_SAMPLES = 1000;
// ...
export async function writeSnapshot(path: string): Promise<void> {
  const overallSamples = Object.values(timings)
    .flatMap(t => t.samples)
    .sort((a, b) => a - b);
  const snapshot = {
    ts: new Date().toISOString(),
    p95_ms: Math.round(percentile(overallSamples, 95)),
    // ...
  };
  await fs.writeFile(path, JSON.stringify(snapshot, null, 2));
}

The comad_brain_perf MCP tool now writes that snapshot after each call, so the shell status command reads real numbers without a live MCP roundtrip.

Pack C — Epistemic hygiene (Popper · O'Neil · Gebru · Pearl · Korzybski)

This pack is the one I'm proudest of. A self-evolving loop that never asks "am I converging on truth or on myself?" is a bias amplifier with good PR.

eye/docs/falsification.md (Popper). When an eye lens predicts wrong, its weight decays: w_new = w_old × 0.9^n. Predictions that can't even be falsified (no verifiable outcome) are excluded from the log entirely. A lens that can't be wrong can't earn trust.
ear/docs/source-diversity.md (O'Neil). The 31 RSS feeds skew hard toward big tech and English-speaking academia. Three monitoring metrics: BigTechRatio, RegionDiversity, PerspectiveSpread. When any reaches "severe," the weekly digest gets a manual-supplement flag before it goes out.
brain/docs/model-cards/ (Gebru). Google-format model cards for synth-classifier and eye-lens: intended use, training data, known failure modes, ethical considerations.
brain/docs/causal-edges.md (Pearl + Korzybski). Edge typology (assoc/corr/causal), intervention-evidence requirement before promotion, temporal decay rules (Korzybski: the map is not the territory; old nodes should visibly fade).

Pack D — Ecosystem (Wilson · Wolfram)

The pipeline ear → brain → eye is a food chain, not an ecosystem. Wilson's mutualism is missing; Wolfram's emergent explainability is missing.

docs/feedback-loops.md: reverse edges. eye → ear (high-accuracy lenses boost source priority). brain → ear (hub topics nominate new RSS feeds). sleep → brain (session patterns warm the query cache). photo → voice (processing events trigger workflows).
brain/scripts/graph-archaeology.ts (432 lines, tsc --noEmit clean): whyHub(nodeId) and timeline(nodeId). When a node becomes a surprise hub, the script replays how it got there — degree over time, first three inbound edges, peak week. Post-hoc forensics as a first-class tool.

The parallelization trick

I didn't write Pack B/C/D alone. Once I'd chosen what needed to exist, I ran a pumasi-style workflow — Codex CLI as parallel sub-developer. Four workers, ~15 minutes, 24/24 gates passed. Pack A I wrote myself; it required the review context Codex didn't have.

That boundary — delegate structure; don't delegate judgment — turned out to be the hour of the review I most needed.

The reversal (ADR 0001 → ADR 0011)

Two weeks ago I wrote ADR 0001: "umbrella repo + six nested .git repos, one per module." Clean separation. Each module ships independently someday.

Auditing v0.3.0, I found three facts I hadn't wanted to see:

The umbrella was already tracking every module's source code. A user cloning comad-world got a working system. The nested .git was a dev-only artifact nobody but me ever interacted with.
The nested .git remotes pointed at github.com/kinkos1234/comad-{brain,ear,eye,...}.git — all 404. The per-module pull logic in scripts/upgrade.sh had literally never worked.
In today's session, I accidentally committed the same file to both the nested git and the umbrella, because the dual-tracking made it unclear which git I was in.

So I ran the 6-luminary adoption check again, specifically on the repo strategy. This time I asked: at 15 stars and 1 maintainer, what decision maximizes adoption?

Axis	A. Status quo (7 repos)	B. Mono-repo	C. Submodules
Norman (first-run)	3	9	5
Linus (pragmatic)	4	9	4
DHH (convention)	2	10	3
Collison (DX)	3	9	6
Moore (chasm)	2	9	5
Evan You (OSS)	3	8	4

Unanimous: B. Nobody picked submodules (C) because submodules are a known war crime against Dependabot, changesets, release-please, and first-time contributors.

I moved the six .git directories to /tmp/comad-nested-git-archive/ (recoverable for 7 days), absorbed the module source directly into the umbrella, and wrote ADR 0011 — Mono-repo Reversal, Supersedes ADR 0001.

The sentence at the top of the new ADR is the one I want to remember:

YAGNI. The case for module-level independent release doesn't exist yet. When it does, ADR 0012 can re-split.

Writing an ADR whose entire content is "I was wrong" was the most honest thing I shipped this sprint.

What I learned

1. Self-review compounds faster than real reviews for early-stage projects. At 15 stars, user feedback is too thin to expose structural issues in under a month. Role-prompting 27 distinct perspectives — each with a rubric, not just a voice — surfaces in two hours what would take two quarters of community growth. The value isn't "AI reviews your code"; it's forcing yourself to argue from a perspective you haven't chosen.

2. Engineering scores lie about product health. ADRs, CI, deep module boundaries, cleanup commits — all great. None of them would've moved my star count from 15 to 50. What moves it is: a hero sentence a stranger can parse in 3 seconds, a comad hello that works in one terminal, a README that passes the "would I clone this on my phone while waiting for coffee?" test.

3. The most dangerous architecture is the one you wrote while smart. Two weeks ago I was deeply thoughtful about repo strategy. I wrote an ADR. I earned the right to revisit by writing a superseding ADR out loud, not by deleting the old one. That asymmetry — decisions are cheap; reversals must be expensive — is the thing that keeps future-me honest.

4. YAGNI scales better than premature pluralism. Nested repos for independent release. Submodules for "flexibility." Multi-tier caching before traffic. All of these felt smart; all of them cost me adoption. At every level I'm learning to ask: which user, right now, will thank me for this complexity? If the answer is "a theoretical future user," delete it.

Try it

git clone https://github.com/kinkos1234/comad-world
cd comad-world
./install.sh
comad hello    # 5-minute quickstart

v0.3.0 is live: https://github.com/kinkos1234/comad-world/releases/tag/v0.3.0
gitub repo: https://github.com/kinkos1234/comad-world

If you use it, I want to hear what breaks. Not the polite feedback — the "this felt wrong and here's why" feedback. That's what the review pattern above will happily automate for you on whatever you're building next.

Comments and pull requests welcome. The falsification log is already counting my predictions wrong. Eventually, if the loop works, it'll count yours too.