Mock Layer¶
Deterministic testing with mocked LLMs and tools. Zero API calls, millisecond execution.
MockLLM¶
MockLLM(*, default_response='Mock response', default_model='mock-model')
¶
A mock LLM provider that returns preconfigured responses.
Responses are matched by pattern rules (regex, substring, exact match) or returned from a default response. All calls are recorded for assertions in tests.
Fluent API (recommended)::
llm = MockLLM(default_response="I don't know")
llm.on_input(contains="weather").respond("It's sunny today")
llm.on_input(pattern=r"book.*flight").respond("Flight booked!")
response = await llm.complete("What's the weather?")
assert response == "It's sunny today"
Classic API::
llm.add_rule("weather", "It's sunny today")
llm.add_rule(r"book.*flight", "Flight booked!", match_mode=MatchMode.REGEX)
Sequential responses::
llm.on_input(contains="hello").respond(["Hi!", "Hey there!", "Greetings!"])
# First call returns "Hi!", second returns "Hey there!", etc.
calls
property
¶
All recorded calls.
call_count
property
¶
Total number of calls made.
last_call
property
¶
The most recent call, or None if no calls have been made.
attach_faults(injector)
¶
Wire a FaultInjector so LLM faults fire automatically on calls.
When attached, complete(), complete_sync(), and stream()
check the injector before responding — no manual check_llm()
guards needed.
::
llm.attach_faults(fault_injector)
fault_injector.on_llm().server_error()
await llm.complete("hello") # raises LLMServerError automatically
with_usage(*, prompt_tokens=None, completion_tokens=None, auto_estimate=False)
¶
Configure simulated token usage on recorded calls.
When set, every LLMCall recorded by complete(),
complete_sync(), or stream() will include token counts.
This enables testing cost-tracking code paths with MockLLM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt_tokens
|
int | None
|
Fixed prompt token count per call. |
None
|
completion_tokens
|
int | None
|
Fixed completion token count per call. |
None
|
auto_estimate
|
bool
|
If True, estimate tokens from text length
( |
False
|
::
llm = MockLLM().with_usage(prompt_tokens=100, completion_tokens=50)
await llm.complete("hello")
assert llm.last_call.prompt_tokens == 100
# Or auto-estimate from text length
llm = MockLLM().with_usage(auto_estimate=True)
on_input(*, contains=None, pattern=None, exact=None)
¶
Start a fluent rule definition by specifying how to match input.
Exactly one of contains, pattern, or exact must be given.
Returns an :class:_InputMatcher whose .respond() or .stream()
method completes the rule.
::
llm.on_input(contains="weather").respond("Sunny!")
llm.on_input(pattern=r"book.*\d+").respond("Booked")
llm.on_input(exact="hello").respond("Hi")
add_rule(pattern, response, *, match_mode=MatchMode.SUBSTRING, model=None, metadata=None)
¶
Add a pattern → response rule. Returns self for chaining.
stream_response(pattern, chunks, *, delay_ms=0, match_mode=MatchMode.SUBSTRING, model=None, metadata=None)
¶
Configure a streaming response for a pattern. Returns self for chaining.
When stream() is called with text matching pattern, an async
iterator yields StreamEvent(TEXT_DELTA) for each chunk, wrapped
by RUN_START / RUN_END events.
::
llm.stream_response("weather", ["It's ", "sunny ", "today!"], delay_ms=10)
async for event in llm.stream("What's the weather?"):
print(event)
stream(text)
¶
Stream a mock response as an async iterator of StreamEvent chunks.
Looks up stream rules first. If no stream rule matches, falls back to the regular response rules (or default) and yields the full response as a single TEXT_DELTA chunk.
If a FaultInjector is attached, checks for LLM faults first.
This is a synchronous method that returns an async iterator — no
await needed::
async for event in llm.stream("hello"):
...
complete(text)
async
¶
Generate a mock completion for the given input text.
If a FaultInjector is attached, checks for LLM faults first.
complete_sync(text)
¶
Synchronous version of complete for non-async agents.
get_calls_matching(pattern)
¶
Get all calls whose input_text contains the given substring.
was_called_with(text)
¶
Check if any call's input_text contains text (substring match).
reset()
¶
Clear all recorded calls and reset rule sequence counters.
reset_calls()
¶
Clear recorded calls but keep rule sequence counters.
MockTool¶
MockTool(*, strict_validation=True, default_response=None)
¶
A mock tool executor that returns preconfigured responses.
Register tools with their schemas and responses. When called, validates arguments against the schema, records the call, and returns the configured response.
Fluent API (recommended)::
tool = MockTool()
tool.on_call("get_weather").respond({"temp": 72, "unit": "F"})
result = await tool.call("get_weather", {"city": "NYC"})
assert result == {"temp": 72, "unit": "F"}
Classic API::
tool.register("get_weather", response={"temp": 72, "unit": "F"})
With schema validation::
tool.on_call("search").respond(
{"results": []},
schema={"properties": {"query": {"type": "string"}}, "required": ["query"]},
)
Sequential responses::
tool.register("roll_dice", response=[4, 2, 6])
# First call returns 4, second returns 2, third returns 6, then cycles
Return a list as-is (no cycling)::
tool.register("search", response=literal(["doc1", "doc2", "doc3"]))
# Every call returns ["doc1", "doc2", "doc3"]
calls
property
¶
All recorded calls.
call_count
property
¶
Total number of calls made.
last_call
property
¶
The most recent call, or None if no calls have been made.
registered_tools
property
¶
Names of all registered tools.
attach_faults(injector)
¶
Wire a FaultInjector so faults fire automatically on tool calls.
When attached, call() and call_sync() check the injector
before executing — no manual check_tool() guards needed.
::
tool.attach_faults(fault_injector)
fault_injector.on_tool("search").timeout(5)
await tool.call("search", {}) # raises ToolTimeoutError automatically
on_call(tool_name)
¶
Start a fluent tool registration by specifying the tool name.
Returns a :class:_ToolCallMatcher whose .respond() or .error()
method completes the registration.
::
tool.on_call("get_weather").respond({"temp": 72})
tool.on_call("bad_service").error("Service unavailable")
register(name, *, response=None, error=None, schema=None, description='')
¶
Register a tool with its response and optional schema. Returns self for chaining.
call(name, arguments=None)
async
¶
Call a registered tool with the given arguments.
If a FaultInjector is attached, checks for faults first (async, supporting real delays for slow faults).
call_sync(name, arguments=None)
¶
Synchronous version of call for non-async agents.
get_calls_for(tool_name)
¶
Get all calls for a specific tool.
was_called(tool_name)
¶
Check if a tool was called at least once.
assert_tool_called(tool_name, *, times=None, with_args=None)
¶
Assert that a tool was called, optionally checking count and arguments.
Returns the first matching :class:ToolCallRecord.
Raises AssertionError with a descriptive message on failure.
assert_tool_not_called(tool_name)
¶
Assert that a tool was never called.
Raises AssertionError with a descriptive message on failure.
reset()
¶
Clear all recorded calls and reset response sequence counters.
reset_calls()
¶
Clear recorded calls but keep response sequence counters.
FaultInjector¶
FaultInjector()
¶
Injects configurable faults into tool calls and LLM requests.
Use the fluent API to configure faults, then call check_tool()
or check_llm() before each operation to see if a fault should fire.
Usage::
fault = FaultInjector()
fault.on_tool("search").timeout(5)
fault.on_llm().context_overflow()
# In agent/test code:
fault.check_tool("search") # raises TimeoutError
fault.check_llm() # raises ContextOverflowError
records
property
¶
All fault injection records (triggered and non-triggered).
triggered_records
property
¶
Only records where the fault actually fired.
trigger_count
property
¶
Number of times a fault was triggered.
triggered
property
¶
Whether any fault was triggered. Safe to use in if fi.triggered:.
on_tool(tool_name)
¶
Configure a fault for a specific tool.
on_llm()
¶
Configure a fault for LLM requests.
check_tool(tool_name)
¶
Check if a fault should fire for this tool call. Raises on fault.
Call this before executing a tool. If a fault is configured and should trigger, raises the appropriate exception.
check_tool_async(tool_name)
async
¶
Async version of check_tool — supports slow fault with real delay.
check_llm()
¶
Check if a fault should fire for this LLM call. Raises on fault.
check_llm_async()
async
¶
Async version of check_llm — supports slow fault with real delay.
was_triggered(target=None)
¶
Check if any fault was triggered, optionally for a specific target.
has_faults_for(tool_name)
¶
Check if any faults are configured for a tool.
has_llm_faults()
¶
Check if any LLM faults are configured.
reset()
¶
Clear all faults and records.
reset_records()
¶
Clear records but keep fault configurations.
MockMCPServer¶
MockMCPServer(*, name='mock-mcp-server', version='1.0.0')
¶
A mock MCP server that responds to JSON-RPC 2.0 messages.
Supports the core MCP protocol methods:
- initialize — server handshake
- notifications/initialized — client acknowledgment (no response)
- tools/list — enumerate registered tools
- tools/call — invoke a registered tool
Usage::
server = MockMCPServer(name="test-server")
server.register_tool("get_weather", response={"temp": 72})
# JSON-RPC message handling
resp = await server.handle_message({
"jsonrpc": "2.0", "id": 1,
"method": "tools/call",
"params": {"name": "get_weather", "arguments": {"city": "NYC"}}
})
assert resp["result"]["content"][0]["text"] == '{"temp": 72}'
Batch and raw string handling::
resp = await server.handle_raw('{"jsonrpc":"2.0","id":1,"method":"tools/list"}')
Call recording and assertions::
server.assert_tool_called("get_weather")
assert server.call_count == 1
calls
property
¶
All recorded tool calls.
call_count
property
¶
Total number of tool calls made.
last_call
property
¶
Most recent tool call, or None.
registered_tools
property
¶
Names of all registered tools.
tool_definitions
property
¶
All tool definitions.
register_tool(name, *, response=None, error=None, description='', input_schema=None)
¶
Register a tool that the server exposes. Returns self for chaining.
handle_message(message)
async
¶
Handle a single JSON-RPC 2.0 message. Returns response or None for notifications.
handle_raw(raw)
async
¶
Handle a raw JSON-RPC string. Returns JSON string response.
get_calls_for(tool_name)
¶
Get all calls for a specific tool.
was_called(tool_name)
¶
Check if a tool was called at least once.
assert_tool_called(tool_name, *, times=None, with_args=None)
¶
Assert a tool was called, optionally checking count and arguments.
assert_tool_not_called(tool_name)
¶
Assert that a tool was never called.
reset()
¶
Clear all recorded calls and reset sequence counters.
reset_calls()
¶
Clear recorded calls but keep response sequence counters.
Match Mode¶
MatchMode
¶
Bases: str, Enum
How a pattern rule matches against input text.
Helper: literal¶
literal(value)
¶
Wrap a value to prevent sequence cycling in MockTool/MockLLM.
By default, passing a list as a tool response causes MockTool to
cycle through its elements on successive calls. Use literal()
when you want to return the list itself::
# Without literal — cycles: first call → "doc1", second → "doc2"
tool.register("search", response=["doc1", "doc2"])
# With literal — always returns ["doc1", "doc2"]
tool.register("search", response=literal(["doc1", "doc2"]))