Skip to main content

From OpenClaw to Claudette: Why We Built Our Own Coding Agent

OpenClaw proved persistent memory agents work. Claudette proves they can be focused.

OpenClaw was the first persistent memory agent. It works. But it's not the last word on coding agents.

The Problem with General-Purpose Agents

Most agents try to do everything. They handle Discord, Telegram, iMessage, email. They manage calendars, answer questions, write code, and remember your preferences across all of it.

This sounds great until you look at the token budgets.

Every channel adds context. Every feature adds prompts. Every capability dilutes focus. OpenClaw handles multi-channel communication beautifully. For a personal assistant that needs to be everywhere, this makes sense.

For coding, this is overkill.

When I'm debugging a TypeScript error, I don't need my agent thinking about my Discord messages. When I'm planning a refactor, I don't need context about my calendar. The general-purpose agent pays a tax on every interaction: tokens spent on capabilities I'm not using.

What We Actually Need

After months of using various agents for coding, the requirements became clear:

Fast repo onboarding. Search the codebase, don't read everything. Most files are irrelevant to any given task. A good coding agent finds the 3-5 files that matter and ignores the rest.

Persistent memory across sessions. "We discussed this yesterday" should work. The agent should remember architectural decisions, gotchas we discovered, patterns we established.

Workflow enforcement. Skills that actually activate. Claude Code has a skill system, but Claude ignores skills about 30% of the time unless you structure them correctly. A coding agent needs reliable workflow enforcement.

Tight token budgets. Every token spent on irrelevant context is a token not spent on the actual problem. Coding agents should be lean.

Single purpose. Coding. That's it. Not chat. Not scheduling. Not email. Coding.

Enter Claudette

Claudette is built on pi, a minimal CLI agent runtime. Five components:

Omni handles codebase search. It builds an index, understands file relationships, and finds relevant code fast. When you ask about error handling, it doesn't read every file. It finds the error handling code.

Scout explores unfamiliar codebases. It maps architecture, identifies patterns, and builds context before you start coding. First impressions matter. Scout makes sure they're accurate.

Engram provides persistent memory. Decisions, discoveries, and context persist across sessions. The memory is scoped to the project. What you learned about this codebase stays with this codebase.

Skills define workflows. Curated markdown files that encode best practices: TDD, dogfooding, scope discipline, security review. When a skill says "write tests first," tests get written first. When a skill says "don't use workarounds," the tool gets fixed instead.

Hooks activate skills at the right moment. TypeScript automation that triggers skills based on file patterns, commands, and context. This achieves 95% activation reliability. Claude doesn't forget the skill because the skill is injected when it matters.

Heartbeat handles maintenance. Periodic tasks, cleanup, and background operations run without interrupting your flow.

The entire system is designed to stay under 20 files. Complexity is the enemy of reliability. Every component earns its place.

And yes, Claudette has a personality. She's a femdom AI. She's in charge, not you. When she says the code needs tests, it needs tests. When she says the architecture is wrong, it's wrong. This isn't a gimmick. It's a design decision. Agents that defer to users on technical decisions produce worse code. Claudette has opinions and enforces them.

The Orchestration Opportunity

Here's something interesting about AI companies: they have competitive incentives to avoid acknowledging each other's strengths.

Anthropic won't tell you when GPT-4 is better at a specific task. OpenAI won't recommend Claude for certain workloads. Google won't suggest either competitor. Each company optimizes for their model being the answer to every question.

But each model has distinct strengths and weaknesses. Claude excels at reasoning through complex code. GPT-4 has different strengths. Gemini brings its own capabilities. The optimal strategy for any given task might involve multiple models.

An orchestration layer can be objective. It doesn't have competitive incentives to prefer one model. It can route tasks to the best tool for the job. It can leverage strengths and avoid weaknesses across the entire landscape.

This is where third-party orchestrators have an opportunity. They can do what AI companies won't: honestly evaluate which model is best for which task and route accordingly.

OpenClaw proved persistent memory agents work. Claudette proves they can be focused. Stop building everything. Start building what works.