When an AI coding agent does something wrong, the natural reaction is to add more instructions.

Another rule. Another example. Another edge case paragraph.

The prompt grows from a few hundred tokens to thousands, then tens of thousands. And somehow the agent gets worse - it forgets rules from the top, applies the wrong convention, or just ignores half of what you wrote.

The problem isn't missing instructions. The problem is architectural.

The Monolithic Prompt Trap

Most people start with a single system prompt that tries to cover everything: code style, git conventions, testing standards, ticket management, deployment rules, error handling patterns.

It works fine at first. Then the project grows, the team adds more conventions, and suddenly you're maintaining a 10-page document that the model has to reason over for every single request.

This is the equivalent of putting your entire application logic in one file. It works until it doesn't.

LLMs have finite attention. The more context you load, the less reliably the model attends to any specific part of it. Research on this is clear - performance degrades as context grows, especially for instructions in the middle of long prompts.

You're not giving the agent more knowledge.

You're giving it more noise.

Skills as Modular Units

Instead of one massive prompt, break your instructions into self-contained modules - call them skills, rules, playbooks, whatever fits your setup.

Each one covers exactly one domain:

git workflow conventions
code style for your language
how to interact with your ticket system
testing standards

Each skill has two properties:

A trigger condition: when should this knowledge be active? A git workflow skill is irrelevant when the user asks about database migrations. A Java code style skill is irrelevant when the task is creating a Jira ticket.
A focused scope: it contains everything the agent needs for that domain, and nothing else.

The key insight: skills are loaded on demand, not all at once. When someone asks the agent to create a merge request, it loads the git workflow skill. When someone asks to implement a feature, it loads the code quality and language-specific skills.

The active context stays small and relevant.

This isn't a new idea. It's the same principle behind modular software design - high cohesion, low coupling. Each module knows its domain deeply and doesn't leak into others.

The Router, Not the Brain

If skills are modules, you need something to route between them. This is a lightweight orchestrator - a small set of instructions that knows what skills exist and when to load them. It doesn't contain domain knowledge itself. It's a lookup table.

Think of it as an API gateway for your agent's knowledge. The gateway doesn't process business logic. It takes the incoming request and forwards it to the right service.

In practice, this means your base prompt is tiny. It contains:

A list of available skills with their trigger conditions
Rules for when to load which skill
Delegation logic for which agent handles which domain

Everything else lives in the skills themselves, loaded only when needed.

Split Agents by Domain

The same principle applies to agents themselves. A single agent with every tool and every piece of context is the prompt equivalent of a god object. It can do everything, which means it does nothing reliably.

Instead, split by domain. One agent handles ticket management. Another handles git operations. Another handles code implementation. Each gets only the tools and context relevant to its job.

This has a practical benefit beyond context size: agents can chain. The code agent writes the implementation, then delegates the commit to the git agent. The git agent doesn't need to know about the code style. The code agent doesn't need to know about commit message conventions. Each operates within its domain.

Delegation rules should be explicit. URL patterns, keyword triggers, tool categories - whatever makes the routing unambiguous. When the boundaries are clear, the agent doesn't have to guess which hat to wear.

What You Actually Get

Smaller active context. Instead of 50k tokens always loaded, you load 2-5k per skill, only when relevant. The model reasons over focused information instead of scanning a wall of text.

Predictability. Each skill is testable in isolation. You can verify that your git workflow skill produces correct branch names without worrying about Go conventions interfering. When something breaks, you know which skill to look at.

Composability. Skills can be shared across projects. A new repository gets the same quality standards by importing the same skill files. Teams standardize without forcing everyone into one monolithic prompt.

Maintainability. Adding a new convention means editing a skill file, not hunting through a 10-page document hoping you don't break something else. Skills can have clear ownership - the backend team maintains language-specific skills, the platform team maintains git and deployment skills.

How to Start

Audit your current prompt. Read through it and highlight where topics change. Every topic boundary is a candidate for extraction into a separate skill.

Extract each domain into its own file. One file per domain. Give each a clear trigger condition - a simple sentence describing when this skill should be active.

Build a minimal router. Your base prompt becomes a table: "when the task involves X, load skill Y". Nothing more.

Add delegation rules if you use multiple agents. Map tool categories to agents. Make the boundaries explicit.

Iterate by diagnosing, not appending. When the agent misbehaves, don't add another paragraph. Ask: which skill should have handled this? Was it loaded? Was the trigger condition wrong? Was the skill content unclear? The fix is almost always in one specific skill, not in "more instructions".

The Mental Shift

The instinct to write bigger prompts comes from treating the AI agent like a junior developer who needs exhaustive instructions upfront. But that mental model breaks at scale. A better model: treat the agent like a system of services, each with a clear API contract and bounded context.

You wouldn't build a microservices architecture by putting all the logic in the API gateway. Don't build an agent architecture by putting all the knowledge in the system prompt.

The next time your agent does something wrong, resist the urge to add more text. Instead, ask: is this a missing skill, a wrong trigger, or a routing problem? The answer will be more useful than another paragraph in an already-too-long prompt.

Hope this helps,

Cheers!

This resonates deeply. We hit the same wall building automation workflows — a single monolithic prompt trying to handle every edge case becomes brittle fast. The shift to composable skills is essentially the same principle as microservices vs monoliths, but for LLM orchestration. One pattern that's worked well: treating each skill as a self-contained unit with its own context window budget, clear input/output contracts, and fallback behavior. Instead of one 8k-token mega-prompt, you get five 1k-token focused skills that the orchestrator chains based on intent classification. The debugging story also improves dramatically — when something breaks, you know exactly which skill misfired instead of hunting through a wall of instructions.

Stop Writing Bigger Prompts. Start Designing Agent Skills.

The Monolithic Prompt Trap

Skills as Modular Units

The Router, Not the Brain

Split Agents by Domain

What You Actually Get

How to Start

The Mental Shift

Comments (2)

Ways of Working

Small Pull Requests Win Every Time

More from this blog

MCP vs ACP: Clarifying the Protocol Layer in the Agentic Era

A Good Agent Skill Is a Contract, Not a Prompt

Small Pull Requests Win Every Time

Allocation Rate in Go and Java

Command Palette

The Monolithic Prompt Trap

Skills as Modular Units

The Router, Not the Brain

Split Agents by Domain

What You Actually Get

How to Start

The Mental Shift

Comments (2)

Ways of Working

Small Pull Requests Win Every Time

More from this blog