ServicesAboutNotesContact Get in touch →
EN FR
Note

MCP Server Testing and Debugging

Testing MCP servers with the Inspector, the stderr logging gotcha that bites everyone, and a practical three-stage testing workflow.

Planted
mcpdata engineeringtesting

Testing MCP servers requires tools that speak the protocol. You can’t just curl an MCP server the way you’d test a REST API — the communication is JSON-RPC over stdio or HTTP, with protocol negotiation and capability discovery. The MCP Inspector is the essential tool here.

MCP Inspector

The Inspector is an interactive testing UI that connects to your server and lets you exercise every capability:

Terminal window
# Test a Python server
npx @modelcontextprotocol/inspector uv run server.py
# Test a Node.js server
npx @modelcontextprotocol/inspector node build/index.js
# Connect to a remote HTTP server
npx @modelcontextprotocol/inspector --connect https://mcp.yourcompany.com

The Inspector launches a web UI (typically at http://localhost:5173) where you can:

  • View all registered tools, resources, and prompts
  • Call tools with custom arguments and see the response
  • Inspect the raw JSON-RPC messages going back and forth
  • Debug response formatting and error handling
  • Verify input schemas match what you intended

Use the Inspector as your primary development tool. Every time you add or modify a tool, check it in the Inspector before testing with a real AI client. The Inspector shows you exactly what the AI will see — descriptions, schemas, response formats — without the overhead of a full AI conversation.

The stderr Logging Gotcha

The most common mistake when building MCP servers is printing to stdout. stdio transport uses stdout for JSON-RPC messages. Any other output on stdout corrupts the protocol and breaks communication. The server starts, the client connects, but nothing works.

# BAD - breaks JSON-RPC communication
print("Debug: processing query") # Goes to stdout!
print(f"Error: {e}") # Also stdout!

The fix is to send all logging to stderr:

# GOOD - use logging to stderr
import logging
import sys
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
stream=sys.stderr # Explicitly send to stderr
)
logger = logging.getLogger(__name__)
logger.info("Debug: processing query") # Goes to stderr
logger.error(f"Error: {e}") # Also stderr

Or use the Context object inside tools for proper MCP-level logging:

from mcp.server.fastmcp import Context
@mcp.tool()
async def my_tool(query: str, ctx: Context) -> str:
await ctx.info("Processing query") # Proper MCP logging
# ...

The Context logging is better because it goes through the protocol — the client can display it in its UI, filter by severity, and include it in conversation context. Python’s logging module to stderr is the fallback for code outside tool functions (server startup, module initialization, etc.).

The diagnostic pattern: if your server works when you test the business logic in isolation but fails when connected to a client, check for stray print() statements first. Also check for libraries that print to stdout — some database drivers and HTTP clients have verbose modes that default to stdout.

Three-Stage Testing Workflow

Stage 1: Unit Test Business Logic

Test your actual business logic — the database queries, API calls, data processing — separately from MCP. This is standard Python testing:

test_logic.py
def test_query_execution():
result = execute_query("SELECT 1", "test_db")
assert "1" in result
def test_catalog_search():
results = search_tables("orders", tags=["financial"])
assert len(results) > 0
assert results[0]["name"] == "sales.orders"

At this stage, you’re testing that your code works, not that it works through MCP. Mock external dependencies (databases, APIs) as you normally would.

Stage 2: Test Through MCP with Inspector

Launch the Inspector and exercise each tool:

Terminal window
npx @modelcontextprotocol/inspector uv run server.py

Verify:

  • All tools appear in the list with correct names
  • Descriptions are clear and helpful
  • Input schemas show the right types, defaults, and constraints
  • Tools execute without errors on valid input
  • Tools return useful error messages on invalid input
  • Response formatting is clean and parseable

This catches MCP-specific issues: broken serialization, missing type annotations, docstrings that don’t parse correctly, and transport problems.

Stage 3: Integration Test with a Real Client

Connect to Claude Desktop or Claude Code and test with natural language:

Terminal window
# Add to Claude Code
claude mcp add test-server -- uv run server.py
# Start a conversation
claude
> What tools does test-server provide?
> [Test each tool with realistic inputs]

This stage reveals problems that neither unit tests nor the Inspector catch: ambiguous descriptions that make the AI choose the wrong tool, parameter formats that the AI generates incorrectly, response formats that the AI misinterprets, and edge cases in how the AI composes multiple tool calls.

The gap between “works in the Inspector” and “works with an AI” is often in the descriptions and schema design. If the AI consistently misuses a tool, the fix is usually a better description or clearer parameter names — not a code change. The MCP Tool Design Patterns note covers this in depth.

Debugging Tips

Server not responding: Check that the server starts without errors. Run it directly (uv run server.py) and look at stderr output. If nothing appears, the server is likely waiting for input correctly.

Tools not appearing: Verify the @mcp.tool() decorator is present and the function has a docstring. FastMCP skips functions without docstrings.

Schema mismatch: If the Inspector shows a different schema than you expect, check your type annotations. FastMCP infers the schema from Python types — str, int, list[str], Optional[int], Pydantic models.

Timeouts: MCP has its own timeout separate from your backend’s timeout. If a tool calls a slow API, the MCP client might time out before the API responds. Consider adding progress reporting via the Context object for long-running operations.