The advisor pattern: what Claude Code teaches us about delegation
22 April 2026 | Eric Lamy | 13 min read
In any mature organization, expertise isn’t distributed evenly. A senior consultant isn’t consulted for every minor decision. They’re called upon for the truly worthwhile trade-offs. This economy of rare expertise is nothing new: it’s the hallmark of well-functioning teams, firms, and research offices. It rests on a principle as old as management itself—you start at the lowest possible level of expertise and escalate to the next level when justified.
On April 9, 2026, Anthropic released a new tool for its AI agents: the advisor tool. Behind a technical announcement presented as a simple API tweak lies something more interesting. This organizational principle—the targeted intervention of scarce resources—has now been encoded into a native software pattern. A lightweight model performs the task and consults a powerful model on an ad hoc basis for decisions that are beyond its capabilities.
The thesis of this article can be summed up in one sentence. The advisor tool is not a feature, it is a governance pattern which, once placed on our application architectures, makes visible a systemic over-provisioning that we no longer saw.
We will explore it in four parts. First, the mechanics as designed by Anthropic. Then, the figures that make it a serious pattern and not a marketing gimmick. Next, its direct link to how mature organizations already delegate. Finally, the reflection it offers on our business applications, and three questions that a CTO of an SME or mid-sized company should ask themselves after finishing this article.
The pattern that Anthropic has retained
To understand why the advisor tool breaks with the classic multi-agent logic, we must first recall the model it reverses. In the orchestrator-sub-agent architecture that has dominated until now, a heavyweight model coordinates the work. It breaks down the task, distributes the sub-tasks to lighter models, aggregates the results, arbitrates conflicts, and produces the final output. The resource-intensive model is constantly in use, by design.
The advisor pattern reverses this distribution. The executor—Sonnet 4.6 or Haiku 4.5 in the Anthropic nomenclature—drives the task from start to finish. It calls the tools, reads the results, and iterates toward the solution. When it encounters a decision point it cannot confidently resolve, it consults the advisor, typically Opus. The advisor accesses the shared context, returns a short plan, a correction, or a stop signal, and then the executor resumes its work.
The key detail: the advisor never produces user output and doesn’t call any tools itself. It only provides advice. Technically, everything happens in a single API request, without any additional round trips or context management required from the developer.
The emerging decision-making economy differs from what we’re used to. By default, we pay the cost of lightweight capabilities—fast, efficient, and inexpensive. The cost of heavyweight capabilities only arises when justified by genuine difficulty. What was previously the domain of systems engineering—building custom escalation logic between models, managing context transfers independently—is becoming a native API feature. According to Anthropic, the advisor typically generates only 400 to 700 tokens per query, just long enough to return a short plan. The executor, on the other hand, handles the entire output at its lower cost.
This shift is not a mere implementation detail. It’s an architectural stance. Anthropic is betting that the majority of agentic tasks do not require a boundary model to be constantly mobilized, and that the value lies in the ability to invoke this boundary precisely when it changes something.
Why he outperforms the classical orchestrator
The figures published by Anthropic at the launch of the advisor tool do not, on their own, demonstrate the superiority of the pattern. They constitute an indicator. The indicator is nonetheless solid.
On the SWE-bench Multilingual benchmark, which measures an agent’s ability to solve software development issues in nine programming languages, Sonnet 4.6 coupled with Opus as an advisor scores 2.7 points higher than Sonnet alone. This isn’t surprising—help from a smarter model logically improves the score. The real surprise is that this same configuration simultaneously costs 11.9% less per agent task. Better and cheaper. That’s rare.
On BrowseComp, an agentic web browsing benchmark, the effect becomes spectacular for the lightest model. Haiku 4.5 alone achieves 19.7%. Haiku 4.5 with Opus as an advisor climbs to 41.2%, more than double. This configuration remains 29% below Sonnet alone in terms of score, but at 85% less cost per task. A doubling of performance on a lightweight model for a fraction of the price of a mid-range model: the trade-off becomes difficult to ignore. These figures and their methodology—scores averaged over five trials, 300 problems for SWE-bench Multilingual, 1,266 problems for BrowseComp—are detailed in Anthropic’s official blog post of April 9, 2026 .
Why these gains, and why this simultaneous cost saving? The intuition lies in a phenomenon well known to multi-agent system practitioners. The orchestrator becomes overwhelmed by planning. The more sub-agents there are to coordinate, the more energy the central model expends dividing, arbitrating, rewriting instructions, and synthesizing feedback. This cognitive overload ultimately erodes the very quality of the decisions it is asked to make.
The person directly managing the execution benefits from a different position. They are involved in the task, not above it. They move forward. They only call upon higher-level expertise when it changes something in the trajectory. This dynamic is that of the project manager who wants to decide everything—guaranteed to cause delays—as opposed to that of the project manager who delegates effectively and makes timely decisions.
The emerging rule is not “one model does everything”. It is: each level of expertise comes into play where it has an impact.
What this pattern reproduces of human delegation
Anthropic’s engineers did not invent this logic. They transposed it.
In a well-functioning engineering department, a junior engineer executes the code while a senior engineer advises. The senior engineer isn’t consulted on every line of code, every variable choice, or every commit. They are consulted for the major decisions—architecture, debt, structural choices, and high-impact choices. Their scarcity isn’t a problem to be solved, but a constraint to be respected. A senior engineer consulted on everything inevitably becomes a senior engineer who no longer truly advises: they exhaust themselves on minor decisions and lack the cognitive capacity for the important ones. The value of rare expertise is destroyed by its own trivialization.
The same principle applies everywhere. In a rotating operations team, the manager doesn’t approve every ticket, but rather escalates. In a law firm, the partner doesn’t work on every case; they decide on points where professional judgment makes the difference. In a hospital, the head of department doesn’t examine every patient; they are called in when the diagnosis or treatment decision requires it. The model is so universal that we no longer see it. Yet it deserves to be named. It’s the principle of technical subsidiarity. We handle things at the lowest possible level of expertise and escalate to the appropriate level when justified.
What the advisor tool encodes is this principle. It encodes it natively for the first time in the API of a boundary model, with all the mechanics that well-executed human delegation requires—shared context, targeted call, task resumption, bounded return. Anthropic observed how mature organizations utilize their rare experts and decided that its AI agents would function similarly.
The most revealing detail lies in the constraint placed on the advisor. They do not produce user output, do not take control of the tools, and do not replace the implementer. They provide advice and then relinquish control. This is precisely what is expected of a good internal advisor—to advise, not to take over. The pattern reflects organizational maturity, not just cost optimization.

The mirror he holds up to our application architectures
Applying this pattern alongside our current application architectures produces a disturbing mirror effect. Many business applications we encounter at our clients’ sites operate according to the exact opposite pattern.
A CRM that loads its entire logic of rules, permissions, and segmentation to display a single field in a contact record. An ERP that launches a complete workflow engine, with all its cross-functional consistency checks, for a status change that should only involve three objects. An HR platform that queries all its modules—payroll, training, time, leave, contracts—to determine if an employee can take a day off the following week. These applications aren’t poorly designed. They’re often technically sound, functionally comprehensive, and they work. That’s precisely the problem. They work well enough that we don’t question them, and they cost more with each interaction than they should.
The cost is rarely visible on a line item. It’s paid elsewhere. In perceived response time for the user—the form that takes two seconds instead of two hundred milliseconds. In infrastructure consumption that forces over-provisioning of production environments to handle the load. In technical debt that accumulates because each change must traverse the entire logic, even when it only concerns a narrow and isolated path. In operational costs at scale that become prohibitive as the user base grows, when the initial architectural model didn’t allow for it.
The alternative is hardly revolutionary. It has a well-established name: composition. Lightweight modules handle the majority of cases—reading a field, displaying a list, updating a simple status. Targeted calls to heavyweight logic are made when justified—for complex business validation, inter-module arbitration, or high-stakes rules requiring true inference. This is precisely the efficiency that the pattern advisor encodes in the Anthropic API, transposed to application design.
The challenge for a CTO or application architect isn’t to copy the pattern line by line. It’s to use the advisor tool as a framework for analyzing their own applications. When Anthropic modeled its AI agents to resemble well-structured human teams, it’s because this structure is more efficient, more cost-effective, and more scalable. The question this pattern poses to us is simple: why do our business applications still resemble intelligent monoliths that can do everything and are expensive with every decision?
This is precisely what we address in sustainable architecture that secures your software investment , and which can be found, applied to a very specific type of business application, in the anatomy of a modern CRM and its five architectural pillars .
Three questions a CTO should ask themselves
Three questions logically arise from this reflection. They are not abstract. They arise concretely, application by application, module by module, and structure an architectural audit approach that a design office can conduct methodically.
The first question: where are your applications over-provisioning their processing? In other words, where is heavy processing power being used by default when light processing would suffice? These issues are rarely visible bugs. They are legacy architectural choices that have never been reconsidered because nothing forced them to be. They can be detected through measurement—response time, infrastructure consumption, call tracing—and by observing the most frequent user journeys. If the 80% most common interactions use the same heavy processing logic as the 20% of complex cases, there is likely over-provisioning that needs to be addressed.
The second question is more structural: are your escalations formalized or implicit? In the Pattern Advisor, an escalation is an explicitly named call—an identified invocation of the expert resource. In a well-architected business application, escalations to the heavyweight logic—complex business validation, inter-module arbitration, high-stakes rules requiring real reasoning—should be just as explicit. They should be tracked, governed, and measured. When the real decisions of an application are diluted in a tangle of implicit rules, you don’t know where they are made. Therefore, you can neither optimize them, nor protect them, nor govern them. The RACI matrix, which we presented in an article dedicated to clarifying project roles , is the equivalent organizational tool: it makes explicit who decides what, when, and with what delegation.
The third question concerns the trajectory: what would your application look like if restructured according to this principle? It doesn’t call for a complete overhaul. It calls for a gradual reorganization, module by module, which falls directly under the umbrella of technical debt reduction. Each iteration that separates the lightweight path from the heavyweight path, each module that becomes specifically invoked rather than being used by default, is a step that improves performance, cost at scale, and maintainability. This strategy is precisely the one we describe in our article on technical debt, which absorbs 20 to 40% of the IT budget .
These three questions cannot be answered in a single meeting. They structure an approach that produces an output of oversizing, a formalization of escalation points, and a phased reorganization plan over several quarters.
What you need to remember
The advisor tool isn’t a technological breakthrough. It’s a signal. Anthropic didn’t invent the pattern it encodes—it made it visible by coding it in an API. The lesson for a CIO or CTO isn’t in the feature itself. It’s in the reflection.
For business applications, efficiency is no longer found in uniform power. It lies in the precision of delegation. The highest-performing systems of the coming decade will not be those that are the most intelligent everywhere. They will be those that have been able to place intelligence in the right place, escalate at the right time, and let lightweight processes do what they do best without burdening them with a skill they don’t use.
This pattern, as a pattern, resolves the architectural issue. However, it raises a related question, which becomes pressing as soon as AI agents move beyond the development phase and into autonomous production. How do you govern a system capable of making decisions, escalating, and acting without immediate human supervision? This is a separate topic that deserves its own discussion.
Frequently asked questions
-
L'advisor tool est un pattern d'API lancé par Anthropic le 9 avril 2026 dans lequel un modèle exécuteur léger (Sonnet 4.6 ou Haiku 4.5) pilote une tâche agentic de bout en bout et consulte ponctuellement un modèle plus puissant (Opus) lorsqu'il rencontre une décision qu'il ne peut pas résoudre seul. L'advisor ne prend pas la main sur les outils, ne produit pas de sortie utilisateur, et ne fait que renvoyer un plan court ou une correction. Toute l'interaction se déroule dans une seule requête API.
-
The advisor tool is an API pattern released by Anthropic on 9 April 2026, in which a lightweight executor model (Sonnet 4.6 or Haiku 4.5) drives an agentic task end-to-end and consults a more powerful model (Opus) when it encounters a decision it cannot resolve on its own. The advisor does not take control of the tools, does not produce user-facing output, and only returns a brief plan or a correction. The entire interaction happens in a single API request.
-
In the classical multi-agent architecture, a heavy model orchestrates and delegates to lighter models. The advisor pattern inverts this allocation: it is the lightweight model that executes and calls on the heavy model only for complex trade-offs. By default, you pay the cost of the lightweight competence. The heavy competence only arrives when it changes the trajectory. This inversion removes the planning saturation specific to the orchestrator and brings the bulk of the cost back to the executor level.
-
Because a good software pattern is not born in a vacuum: it transposes an economy that works elsewhere. The advisor tool encodes a universal organisational principle — subsidiarity, that is, handling things at the lowest possible level of competence and escalating to the right level when it is justified. This principle holds for an engineering practice, for a law firm, for an ops team, and it should hold for a business application. Confronting our architectures with this pattern makes visible a systemic over-sizing we had stopped seeing.
-
Three signals are particularly useful. The first: measure response time on the most frequent journeys and compare it with that of genuinely complex journeys. If the two are close, your light logic is probably borrowing the heavy path. The second: trace inter-module calls to identify those that fire by default where a preliminary condition would have sufficed. The third: observe infrastructure consumption during off-peak hours — a well-designed application should not consume uniformly across all its features.
-
Yes, but with highly variable returns. Applications with strong asymmetry between simple and complex cases — CRM, ERP, HR platforms, business applications with dense management rules — benefit massively from an architecture composed along this principle, because simple cases dominate largely in volume. More homogeneous applications, where every interaction naturally mobilises the same logic, derive less value from it. The challenge is to map the actual distribution of cases before deciding.
-
A direct link. Systemic over-sizing is a particularly discreet form of technical debt because it does not produce visible bugs — the application works, just slowly and expensively. Decoupling the light path from the heavy path, module by module, is exactly the kind of progressive reorganisation that characterises a well-run technical debt strategy. Every iteration that separates treatments reduces both operating cost and the cost of future evolutions.
-
No. The advisor pattern and microservices answer different questions. Microservices address the question of functional decomposition and independent deployment. The advisor pattern addresses the question of decision economy within a single treatment, regardless of its deployment mode. You can perfectly well have a microservices architecture in which each service reproduces, at its own scale, monolithic over-sizing. The two approaches are complementary, not substitutable.
-
With a targeted audit, not with a rebuild. Identify the two or three most frequent user journeys, measure the real cost of their processing, and isolate the points where light logic would cover 80% of cases without mobilising the heavy logic. The first work streams are those that offer the best impact-to-effort ratio: a separate fast path for simple cases, an explicit invocation of expert logic for complex cases. This iterative approach makes the pattern's value tangible before committing to more structural work.
Eric Lamy
Published on 22 April 2026