Building a Personal Agent

Shortly after OpenClaw came out I started building my own personal agent. I picked Claude Code as the harness, partly out of habit and partly because I wanted to see what it could do outside of coding.

The agent lives in a single directory on my file system. Launching Claude in that folder launches the agent. Nothing about it is Claude-specific, though. It runs on skills, MCPs, and custom CLI commands, and stores everything in markdown or YAML. Any harness can work with these concepts.

Bootstrapping

Each session starts by loading AGENT.md (or CLAUDE.md in this case). It's deliberately compact, just enough to point the agent in the right direction:

Agent identity and a link to SOUL.md, which holds the full description.
Operating principles, like "auto-improve custom CLI commands when they fail."
A short note on how skills are organized and how memory works.
Rules for formatting bash commands so they pass permission lists cleanly.

Everything else loads on demand.

Skills, CLIs, and the file system

Three pieces do most of the work, so it's worth introducing them up front.

Skills are the agent's playbook. Almost every action it takes is driven by one. A skill contains instructions for calling CLIs, with examples and guardrails, and procedures for things like compiling memory prints, priming sessions, and writing or reviewing code. I use relatively few skills with larger instruction sets, rather than a long catalog of small ones. Sessions typically begin with a skill call appropriate to the task. If I'm working on a project, /project primes the session by reading the project wiki.

CLIs are the arms and hands. They're custom-built commands the agent executes via bash, mostly thin wrappers around APIs: calendar-cli, gmail-cli, drive-cli. reddit-cli reads posts as JSON. youtube-cli pulls video transcripts. Going through CLIs instead of MCP gives me clearer control. The Google CLIs, for example, support multiple accounts, and the right credentials get picked up in code rather than from an agent-generated bash script.

CLI calls are also cheaper than MCP calls, and I can bake error correction into the code itself. LLMs love to hallucinate imaginary args, and validating those inside the CLI is trivial. The validation error returns the correct usage, so the agent fixes the call on the next try instead of looping through trial and error.

The file system is the database. Markdown and YAML for everything. Folder structure is the schema.

Markdown as a data store

Journals and logs are the unstructured side of this. The structured side looks more like a row in a database: front matter defines the shape, the body holds the content.

---
id: project-axiom
status: active
priority: high
related: [agent-system, langgraph-experiments]
---

# The Axiom

Notes and references for the agentic newsroom project...

These files reference each other like foreign keys, with links as plain paths. This works because LLMs are, for some reason, very good at reasoning over directory trees, and markdown is plain text the model reads natively. Binary database files are opaque by comparison.

The catch is hallucination. Despite clear instructions, the agent will occasionally write a markdown that doesn't match the defined shape, which immediately breaks anything that reads it.

To handle that, I use a third-party tool called ALS. A hook fires whenever a data markdown is edited, ALS validates it against the shape, and on any error it returns a correction message to the agent. It's pure code, deterministic, free, and faster than any agentic validation loop.

Memory

The first thing I implemented was memory. Initially it was just a text dump into MEMORY.md, but I wanted memories from day one so I'd have a record of building the agent itself. Almost everything else can be re-created. Those chronological footprints can't.

The current system is a journal spread across markdown files, one per day:

memory/
└── journal/
    ├── 2026-03/
    └── 2026-04/
        ├── 01.md
        ├── 02.md
        └── ...

A /remember skill collects memories at the end of each session. Daily files are split into topics, and if I work on the same topic across sessions, the skill bakes new memories into the existing topic by rewriting the whole file. It's a heavy operation, both in tokens and time, which is why one day is the smallest unit. It's small enough not to choke the agent during a rewrite. The next step is probably per-topic daily files.

Memories are also indexed into a file-based SQLite-vec store in the same folder. A hook fires on every journal edit and re-chunks the file. Chunk size is dynamic, anchored on section headers, so each topic becomes one chunk, tagged with the header and filename as metadata.

A /recall skill handles retrieval. The topic structure already creates clean semantic boundaries, which makes search effective on its own. On top of that, the final retrieval is hybrid: vector similarity blended with BM25 keyword scores at 70/30.

Knowledge bases

This part borrows from Andrej Karpathy's idea of LLM knowledge bases. There's a raw/ folder in the project root where the agent dumps anything mid-session: notes, screenshots, snapshots. Each night a scheduled task compiles those into a wiki, blending new material into what's already there. The wiki is a continuously updated snapshot of what I've been thinking and working on.

Projects

Almost every session is tied to some project: code, research, or work on the agent itself. A project is the basic unit of work. PROJECT.md holds the instructions, and each project has its own knowledge base that compiles into a self-updating wiki.

The agent also has project management built in. A Trello-like board with lists and cards, all stored as markdown.

---
id: card-042
list: in-progress
title: Implement /recall hybrid retrieval
created: 2026-04-12
---

Blend vector similarity with BM25 at 70/30. Validate on
last week's journal entries...

When a project session primes, the board loads into context so the agent always has the full scope of what's planned. The board is also modeled as a data store shape, which lets a small custom backend read it directly. More on that next.

Core and heartbeat

The agent system has a custom daemon running in the background. It's a cron-like scheduler that fires fresh agent sessions in headless terminals on a schedule. Each session has a small custom system prompt, but otherwise it's the same agent I use interactively, just running on its own.

A heartbeat task runs every 30 minutes. It backs up the system and pushes it to a remote git repo. Nothing more dramatic than that, but it means the agent's accumulated state is never more than half an hour from being safe.

Web apps on top

Most of my interaction is through the Claude CLI, typing and talking. But some things are easier to see than to describe.

The kanban board is the clearest example. A small Trello-like web app reads the same raw markdown files the agent does, which is safe because ALS guarantees the format and referential integrity hold. The web layer doesn't own the data. It just renders it.

Where this goes

A personal agent is a useful tool on its own, but it becomes something different once it has history. Memories, knowledge bases, project notes, access to my email and accounts and Spotify feeds. Every interaction is stored, and the agent's picture of me sharpens with each session.

It's not finished though. Claude Code is a worker harness. It's exceptional at coding and code-adjacent tasks, and bad at being a person. There's no real personality, no conversational nuance, and any character I write into SOUL.md gets diluted past the first few turns by the harness's own system prompt.

The natural next step is splitting the agent in two: a conversational layer on top, a worker layer underneath. The conversational layer would run on a different harness, with LangChain Deep Agents as my current candidate, and exist purely as the interface I talk to, delegating real work downward. That separation is what would let a personality actually take hold and evolve through memory, instead of getting flattened on every prompt.

The other missing piece is autonomy. Right now the agent is almost entirely interactive. Claude Code's recent auto-mode is a step in the right direction. It skips manual permission prompts in favor of a dedicated permission model. It's not quite there yet, but for an agent to run long tasks unattended, something like it is essential.