Generative AI

How we make AI agents write production code: inside MCAF

The reason AI coding agents produce impressive demos and unshippable code is that nothing around them enforces a standard. The model generates something that looks right, no test proves it is right, no rule stops it from making the same mistake next week, and the context it needed lived in a chat window that is now gone. We hit this wall enough times that we built a framework to close it, and we run that framework, MCAF, on every client project and every product we ship. This is how it works and why it changes the output.

I'm writing this from the engineering side, because the fix is not a prompt trick. It is architecture: the same discipline that makes human-written code maintainable, enforced on agent-written code, so the agent's speed stops costing you a rewrite.

Why AI agents need a framework at all

A coding agent on its own optimizes for one thing: code that runs right now. That is not the same as code that is correct, secure, or maintainable, which is why 88% of AI agent projects never reach production and most of the failures trace to the system around the model, not the model itself. We cataloged those failure modes in 21 ways AI agents fail in production; MCAF is the answer to that list.

The core idea is narrow and it matters: an agent is only as reliable as the context it reads, the checks it must pass, and the rules it cannot ignore. Give it none of those and you get confident slop. Give it all three, enforced by the repo rather than by hope, and the same model produces code you can actually ship. MCAF is built on exactly three pillars that map to those three needs.

Pillar 1: context lives in the repo, not the chat

The first failure MCAF removes is context loss. In most agent workflows the source of truth is a conversation, so the intent behind the code evaporates when the session ends and the next agent starts blind. MCAF puts everything an agent needs to understand, change, and run the system into the repository itself: code, docs, AGENTS.md files, and skills, all versioned alongside the code they govern.

Concretely, a typical MCAF repo keeps durable documentation under docs/, an architecture map that defines module boundaries, feature specs with business rules and a definition of done, and ADRs that record why a decision was made. Anything that materially affects how the system is developed, verified, or operated belongs in the repo. That means the next agent, or the next human, reads the same authoritative context instead of reconstructing it from a lost conversation. Context stops being a chat history and becomes a repo artifact.

Pillar 2: verification decides what ships, not opinions

The second and most important pillar is verification. In MCAF, integration tests and quality gates are the decision-makers, not opinions. An agent does not get to declare its work done because the output looks plausible. The work is done when it passes the gates, and the gates run on every commit: layered checks, integration tests that exercise real behavior, and maintainability limits that live in AGENTS.md rather than in someone's head.

This is the piece that turns generation into engineering. A model's confidence is meaningless as a quality signal, because a hallucinated function and a correct one look identical in the diff. A passing integration test is not meaningless, it is proof. By making tests and gates the arbiter, MCAF removes the single most dangerous thing about AI code: that it ships on the strength of looking right. For large drops the framework also plans a human review that traces the riskiest boundaries first, so the code too large to read line by line still gets inspected where it matters.

Pillar 3: instructions that make rules durable

The third pillar is what stops the same mistake from happening twice. Every MCAF repo has a root AGENTS.md, and multi-project solutions add local AGENTS.md files at project roots. Agents read root and local AGENTS.md before editing code, so the rules that govern the repo are loaded every time, not remembered sometimes. Rule precedence is explicit: local rules refine root rules.

The part that compounds is self-learning, and MCAF treats it as a cornerstone, not an optional habit. If the same mistake happens twice, the framework expects the rule to be made durable, written into AGENTS.md so no agent repeats it. That is the difference between an agent that drifts and one that improves: the system captures each correction as a permanent rule. Over time the repo teaches every agent that touches it, which is the opposite of starting each session from zero.

The rules that keep agent code changeable

Underneath the three pillars sits a set of engineering rules that exist for one reason: to keep systems changeable and testable, the two properties agent-generated code usually lacks. In MCAF, vertical-slice architecture is mandatory unless an ADR says otherwise. Each feature lives in its own isolated folder tree with its code, tests, and supporting artifacts kept together, and agents are told to prefer the smallest relevant feature slice over scanning the whole repo.

That structure is not stylistic. It is what lets an agent change one feature without reasoning about the entire codebase, which is exactly where agents go wrong, and it is what keeps a change from rippling into five unrelated files. SOLID is mandatory, single responsibility and cohesion are mandatory, and the numeric maintainability limits live in AGENTS.md so they are enforced, not aspirational. These are ordinary senior-engineering principles. The move MCAF makes is enforcing them on the agent, at the structural level, so the output stays maintainable by construction rather than by luck.

How work actually flows through it

For non-trivial tasks, the flow is deliberate. The agent starts with a brainstorm that captures options and trade-offs, moves into a plan with ordered implementation steps, explicit test steps, and final validation commands, then executes against that plan. Simple or obvious tasks skip the ceremony and go straight to execution, so the process adds weight only where weight is warranted.

For large tasks, a lead agent plans the work, identifies which workstreams can run in parallel, and spawns subagents for independent research, implementation, test, and review scopes. Each subagent gets concrete ownership and verification duties, while the lead agent stays responsible for integration, the quality gates, and final completion. It is the shape of a real engineering team, applied to agents: scoped ownership, parallel work where it is safe, and one accountable owner for the whole.

Three ways the agent participates

MCAF does not assume the agent works alone. It supports three participation modes, and a repo can pick a different one per task. In the first, the agent executes scoped work under the current docs, skills, and AGENTS.md. In the second, the agent and engineer iterate together on design, code, tests, and docs. In the third, the agent reviews, critiques, or drafts options while a human keeps implementation control.

What stays constant across all three is the governance: the same verification and the same rules apply whether the agent is driving or assisting. That is the point. The reliability does not come from trusting the agent more or less in a given mode; it comes from the gates and context that hold regardless of mode.

Why this is the product, not the model

Step back and the pattern is the one under all of our AI work: the model produces a draft, and the system that verifies and governs it is what makes the draft shippable. MCAF is that system, written down and enforced by the repo. It is why we can use AI to move fast and still hand a client production code, because the speed comes from the agent and the safety comes from the framework, and neither depends on which model is behind the wheel this quarter.

MCAF is open source and documented at mcaf.managed-code.com, and it runs on every project we deliver. If you are trying to take AI agents from an impressive demo to code you can actually ship and maintain, the gap is exactly the three pillars above, and closing it for your codebase is where our AI Dev Team work starts.

FAQ

What is MCAF? MCAF (Managed Code Agent Framework) is an open-source framework for building real software with AI coding agents. It makes AI-generated code predictable, safe, and repeatable by enforcing repo-native context, integration-test and quality-gate verification, and durable rules in AGENTS.md files.

How does MCAF stop AI agents from producing unreliable code? It makes integration tests and quality gates the decision-makers instead of the model's confidence. Work is done when it passes the gates on every commit, not when the output looks correct, which removes the biggest risk in AI code: shipping on the strength of looking right.

What is AGENTS.md? AGENTS.md is the file where an MCAF repo's rules live. A root AGENTS.md governs the solution, local AGENTS.md files refine rules per project, and agents read them before editing code. When a mistake recurs, the fix is written into AGENTS.md so it never repeats, which is how the framework learns.

Why does MCAF require vertical-slice architecture? So an agent can change one feature without reasoning about the whole codebase. Each feature lives in an isolated folder with its own code and tests, which keeps changes contained and the system testable, the two properties agent-generated code usually lacks.

Do I have to use a specific model with MCAF? No. MCAF's reliability comes from context, verification, and rules enforced by the repo, not from a particular model, so it works across coding agents and stays stable as models change.

“You can’t monetize pain. You can only monetize value. The moment users feel cared for, they’ll see paying as an investment in themselves — not a cost.”

You know what you want to build. Let's go ship it.

Book a 15-min call
Book a 15-min call
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.