Cascade Today: An All-in-One Layer for Claude Code

A month after the first scripts, Cascade is a real product. It is now an all-in-one layer for Claude Code: multi-account fleet routing, token optimization, a persistent daemon, and a hook system. It is the open-source version of ideas that go much further inside nSelf, nClaw, and ClawDE, but the core patterns hold up on their own.

The first thing that matured was multi-model routing. A classifier looks at each request and picks a model tier based on cost and quality signals. Cheap local models handle the easy eighty percent. Frontier models handle the hard twenty. The win is not any single model. It is the ladder, and the discipline of sending each task to the cheapest model that can still do it well.

Retrieval is where the real quality lives, and it took the longest to get right. Cascade leans on hybrid retrieval. Lexical search through Postgres tsvector catches exact names and identifiers. Dense vectors through pgvector catch meaning. Reciprocal Rank Fusion lets both methods vote on the same result set, and a cross-encoder reranker (BGE-M3) cleans up the top of the list. Lexical search alone misses intent. Dense search alone misses the literal symbol you typed. Fusing them, then reranking, is what makes retrieval feel like it understands the codebase.

Memory was the other big piece. Context windows are a budget, not a memory. Cascade keeps thread state outside the window, with rolled-up summaries at a few granularities, so a session can recall what happened last week without paying for the whole history every turn. Year-old context becomes a lookup instead of a reload.

The lesson under all of this is that the highest value is in improving a harness, not replacing it. Claude Code is already strong. The room to make it better is in the seams around it: routing, retrieval, memory, and automation. Cascade does not try to be a smarter model. It tries to give a good agent a better environment to work in.

Building Cascade reinforced something I already believed. AI products are mostly engineering. The model is one component. Orchestration, retrieval, reliability, and cost control are the actual work, and they are where twenty years of building systems matters more than any single trick.