Blog

Progressive Disclosure in MCP Servers: Keeping LLM Context Lean

MCP (Model Context Protocol) servers give LLMs access to external systems, but naive implementations waste thousands of tokens on tool schemas and verbose responses. This post covers patterns for context-efficient MCP design.

The Problem: Schema and Response Bloat

A typical API wrapper exposes one MCP tool per endpoint. Each tool carries its own schema into the LLM's context window. An integration with 8 operations might consume ~480 tokens in schema alone; a gateway pattern can reduce this to ~120 tokens—a 75% reduction.

The tax compounds across every enabled MCP server. A workspace with 5-6 servers can burn 2,000+ tokens before the conversation starts.

Response verbosity compounds the problem. APIs return everything; LLMs need only what's relevant to the current task.

Pattern 1: Gateway Tool with Action Dispatch

Consolidate operations into a single tool with an action discriminator:

@mcp.tool()
async def gateway(
    action: Literal["discover", "list", "get", "search", "export"],
    id: str | None = None,
    query: str | None = None,
    # ... other params with defaults
) -> dict:
    """Service gateway. Actions: discover, list, get, search, export."""

    if action == "discover":
        return {...}  # Capability manifest
    if action == "list":
        return {...}
    # dispatch to implementations

The LLM sees one tool schema. The docstring hints at available actions. Full documentation lives behind discover.

Pattern 2: On-Demand Capability Discovery

Instead of cramming usage details into the schema, expose a discover action that returns structured guidance only when needed:

if action == "discover":
    return {
        "CRITICAL_RULES": [
            "Hard constraints the LLM must never violate",
            "Filesystem or permission boundaries", 
            "Token hygiene requirements"
        ],
        "ACTION_SELECTION": {
            "search": {"purpose": "...", "use_when": "..."},
            "export": {"purpose": "...", "use_when": "..."}
        },
        "capabilities": [...],
        "filters": {...}
    }

Simple requests skip discovery entirely. Complex workflows pay the documentation tax once per session.

Pattern 3: Response-Embedded Guidance

External "skills" or system prompts drift out of sync with server behavior. Embed guidance directly in responses, scoped to the data just returned:

response = {"data": results}

if results_contain_sensitive_pattern:
    response["GUIDANCE"] = {
        "note": "Results contain [pattern] requiring special handling",
        "workflow": [
            "Step 1: ...",
            "Step 2: ...",
            "NEVER do X"
        ]
    }
return response

The LLM receives instructions at the moment of decision, tied to specific data. No stale documentation. No wasted tokens when the guidance doesn't apply.

Pattern 4: Default-On Response Filtering

APIs return complete objects. LLMs usually need a subset. Filter and compact responses by default:

# API returns
{"id": "abc123", "created": "2025-01-05T14:32:05.882Z", 
 "author": {"id": "u789", "email": "jdoe@example.com", "displayName": "Doe, John", "avatar": "..."},
 "content": "Hello", "metadata": {...}}

# Server returns (compact=True, default)
{"ts": "2025-01-05T14:32", "from": "jdoe", "content": "Hello"}

# Server returns (compact=False, when IDs needed for follow-up)
{"id": "abc123", "created": "2025-01-05T14:32:05.882Z", "author": {...}, "content": "Hello"}

Design principles:

Pattern 5: Proactive Warnings

When response data could cause downstream problems, warn explicitly:

if len(results) > 100:
    response["WARNING"] = "Large result set. Consider narrower filters."

if total_chars > 50000:
    response["TOKEN_WARNING"] = f"~{total_chars // 4} tokens. Consider pagination or smaller scope."

if results_written_to_external_filesystem:
    response["NOTE"] = "File written to user's machine. You cannot access it."

This steers the LLM toward self-correction before problems manifest.

Implementation Checklist

A context-efficient MCP server:

Appendix: Embedded Skill Template

For MCP authors who want to provide skill-like guidance to consuming LLMs, structure your discover response (or documentation) like this:

## Critical Rules
- [Hard constraints - filesystem boundaries, token limits, permission scope]

## Action Selection
| Action | Returns | Use When |
|--------|---------|----------|
| list | Collection summary | Need to browse/select |
| get | Single item detail | Have specific ID |
| search | Filtered results | Need subset matching criteria |
| export | File path (external) | User wants local copy |

## Parameter Reference  
| Param | Default | Notes |
|-------|---------|-------|
| compact | true | false when IDs needed for follow-up calls |
| limit | 50 | Cap results; use pagination for more |

## Common Workflows
**[Workflow name]**: step 1  step 2  completion criteria

## Output Modes
- compact (default): [what's included], use for [scenarios]
- full: [what's added], use for [scenarios]

This gives LLMs enough information to use the tool correctly without external prompts that drift from actual behavior.