How I Replaced Fragile If-Else Chains with a Supervisor Agent

Agentic AILLM OrchestrationSystem DesignPython

Six weeks into my role at Idyllic Services, our agentic AI pipeline was a maintenance nightmare. Every new capability meant adding another elif branch. Every edge case broke two others. I replaced the whole thing with a supervisor agent — and the results were the ~99% token reduction and ~10× throughput improvement I'm most proud of.

The Problem: If-Else Hell in an AI Pipeline

We were building an agentic system for JD-driven candidate sourcing. The pipeline needed to handle different types of requests: keyword extraction, semantic candidate matching, profile scoring, outreach drafting. Each had different logic, different LLM prompts, different failure modes.

The original approach was straightforward: a big routing function with if/elif chains.

def route_request(intent, payload):
    if intent == "extract_keywords":
        return keyword_extractor(payload)
    elif intent == "match_candidates":
        return candidate_matcher(payload)
    elif intent == "score_profile":
        return profile_scorer(payload)
    elif intent == "draft_outreach":
        return outreach_drafter(payload)
    else:
        return {"error": "Unknown intent"}

This worked for 4 intents. By the time we had 12, it was unmanageable. Worse: AI systems are non-deterministic. Sometimes the upstream LLM would return an intent that didn't exactly match our strings. "keyword_extraction" vs "extract_keywords". The else branch fired constantly.

Why If-Else Fails in Agentic Systems

Traditional software is deterministic. If you send input X, you get output Y, always. You can enumerate all cases and write branches for them.

Agentic AI systems are probabilistic. The LLM upstream might classify an intent slightly differently each time. The payload structure might vary. Errors cascade in unexpected ways. You simply cannot enumerate all cases with if-else logic — the search space is too large.

The deeper problem: if-else routing is brittle by design. It has zero ability to recover from unexpected inputs. It has no memory of what happened before. It can't learn that "extract_kw" and "extract_keywords" mean the same thing. Every new edge case requires a developer to add another branch.

The Supervisor Agent Pattern

The supervisor agent pattern replaces the routing function with a meta-agent — an LLM that understands intent at a semantic level and delegates to specialized sub-agents.

class SupervisorAgent:
    def __init__(self, sub_agents: dict):
        self.sub_agents = sub_agents
        self.client = BedrockClient()
    
    def route(self, user_request: str, context: dict) -> dict:
        # Supervisor understands intent semantically
        routing_decision = self.client.invoke(
            model="claude-3-haiku",
            prompt=self._build_routing_prompt(user_request, list(self.sub_agents.keys()))
        )
        
        agent_name = routing_decision["agent"]
        agent = self.sub_agents.get(agent_name)
        
        if not agent:
            # Supervisor handles graceful fallback
            return self._handle_unknown(user_request, context)
        
        return agent.execute(user_request, context)
    
    def _build_routing_prompt(self, request, available_agents):
        return f"""You are a routing supervisor. Given this request, select the most appropriate agent.

Available agents: {available_agents}
Request: {request}

Return JSON: {{"agent": "agent_name", "confidence": 0.0-1.0, "reason": "..."}}"""

The Token Reduction: How It Actually Happened

Here's what most people get wrong about this pattern: the token reduction didn't come from the supervisor agent itself. It came from what the supervisor enabled.

The old system used one giant prompt for everything. To handle all 12 intents, we had to include context for all 12 possible tasks in every single call. Each LLM call was carrying 15,000+ tokens of context — most of it irrelevant to the actual task at hand.

With the supervisor pattern, each sub-agent is specialized. The keyword extractor only gets the keyword extraction prompt — ~200 tokens of context. The profile scorer only sees scoring criteria. The supervisor itself uses a small, focused routing prompt.

The math: 15,000 token calls → 150-300 token calls per specialized agent. That's where the ~99% token reduction came from. Not magic — just specialization.

The Throughput Improvement: Async Orchestration

The second win came from async orchestration. With if-else routing, every step was synchronous. Step 1 finished → Step 2 started. A 6-step pipeline with 60-second LLM calls took 360 seconds minimum.

The supervisor can identify which sub-tasks are independent and run them concurrently:

async def execute_parallel_tasks(self, tasks: list) -> list:
    """Execute independent sub-tasks concurrently"""
    async with asyncio.TaskGroup() as tg:
        futures = [
            tg.create_task(self.sub_agents[task["agent"]].execute_async(task))
            for task in tasks
            if self.can_run_parallel(task)
        ]
    return [f.result() for f in futures]

Independent tasks now run in parallel. A pipeline that took 6 minutes sequentially now finishes in under 40 seconds for the same output. That's the ~10× throughput improvement.

Graceful Degradation: What the If-Else Never Had

One thing I didn't expect: the supervisor pattern gave us graceful degradation for free. When the system encountered a request it had never seen, the old else branch returned a generic error. The supervisor agent could reason about the unknown request, find the closest matching agent, and return a partial result with a confidence score.

Production AI systems need to fail gracefully. Users don't understand "Unknown intent." They understand "I couldn't find an exact match, but here's my best attempt, with 74% confidence."

Key Learnings

1. If-else is a smell in AI systems. The moment you have more than 3-4 branches routing to LLM calls, you need a meta-agent.

2. Specialization = token efficiency. Small, focused sub-agents with tight prompts are dramatically cheaper than one monolithic agent trying to do everything.

3. Async is not optional at scale. Sequential LLM calls don't scale. Identify independent tasks and run them concurrently from day one.

4. The supervisor's job is routing, not reasoning. Keep the supervisor prompt small and focused. It should route, not think. Reserve deep reasoning for the specialized sub-agents.

Building agentic AI systems and running into routing complexity? I've been there. Let's talk architecture.

Discuss Your Architecture

Rugved Chandekar AI Systems Engineer — Agentic AI & RAG Pipeline Architect — IEEE 2026 Co-author — Idyllic Services

GitHub LinkedIn