Skip to content

Mock Layer

Deterministic testing with mocked LLMs and tools. Zero API calls, millisecond execution.

MockLLM

MockLLM(*, default_response='Mock response', default_model='mock-model')

A mock LLM provider that returns preconfigured responses.

Responses are matched by pattern rules (regex, substring, exact match) or returned from a default response. All calls are recorded for assertions in tests.

Fluent API (recommended)::

llm = MockLLM(default_response="I don't know")
llm.on_input(contains="weather").respond("It's sunny today")
llm.on_input(pattern=r"book.*flight").respond("Flight booked!")

response = await llm.complete("What's the weather?")
assert response == "It's sunny today"

Classic API::

llm.add_rule("weather", "It's sunny today")
llm.add_rule(r"book.*flight", "Flight booked!", match_mode=MatchMode.REGEX)

Sequential responses::

llm.on_input(contains="hello").respond(["Hi!", "Hey there!", "Greetings!"])
# First call returns "Hi!", second returns "Hey there!", etc.

calls property

All recorded calls.

call_count property

Total number of calls made.

last_call property

The most recent call, or None if no calls have been made.

attach_faults(injector)

Wire a FaultInjector so LLM faults fire automatically on calls.

When attached, complete(), complete_sync(), and stream() check the injector before responding — no manual check_llm() guards needed.

::

llm.attach_faults(fault_injector)
fault_injector.on_llm().server_error()
await llm.complete("hello")  # raises LLMServerError automatically

with_usage(*, prompt_tokens=None, completion_tokens=None, auto_estimate=False)

Configure simulated token usage on recorded calls.

When set, every LLMCall recorded by complete(), complete_sync(), or stream() will include token counts. This enables testing cost-tracking code paths with MockLLM.

Parameters:

Name Type Description Default
prompt_tokens int | None

Fixed prompt token count per call.

None
completion_tokens int | None

Fixed completion token count per call.

None
auto_estimate bool

If True, estimate tokens from text length (len(text) // 4 + 1) instead of using fixed values. Cannot be combined with explicit token counts.

False

::

llm = MockLLM().with_usage(prompt_tokens=100, completion_tokens=50)
await llm.complete("hello")
assert llm.last_call.prompt_tokens == 100

# Or auto-estimate from text length
llm = MockLLM().with_usage(auto_estimate=True)

on_input(*, contains=None, pattern=None, exact=None)

Start a fluent rule definition by specifying how to match input.

Exactly one of contains, pattern, or exact must be given.

Returns an :class:_InputMatcher whose .respond() or .stream() method completes the rule.

::

llm.on_input(contains="weather").respond("Sunny!")
llm.on_input(pattern=r"book.*\d+").respond("Booked")
llm.on_input(exact="hello").respond("Hi")

add_rule(pattern, response, *, match_mode=MatchMode.SUBSTRING, model=None, metadata=None)

Add a pattern → response rule. Returns self for chaining.

stream_response(pattern, chunks, *, delay_ms=0, match_mode=MatchMode.SUBSTRING, model=None, metadata=None)

Configure a streaming response for a pattern. Returns self for chaining.

When stream() is called with text matching pattern, an async iterator yields StreamEvent(TEXT_DELTA) for each chunk, wrapped by RUN_START / RUN_END events.

::

llm.stream_response("weather", ["It's ", "sunny ", "today!"], delay_ms=10)
async for event in llm.stream("What's the weather?"):
    print(event)

stream(text)

Stream a mock response as an async iterator of StreamEvent chunks.

Looks up stream rules first. If no stream rule matches, falls back to the regular response rules (or default) and yields the full response as a single TEXT_DELTA chunk.

If a FaultInjector is attached, checks for LLM faults first.

This is a synchronous method that returns an async iterator — no await needed::

async for event in llm.stream("hello"):
    ...

complete(text) async

Generate a mock completion for the given input text.

If a FaultInjector is attached, checks for LLM faults first.

complete_sync(text)

Synchronous version of complete for non-async agents.

get_calls_matching(pattern)

Get all calls whose input_text contains the given substring.

was_called_with(text)

Check if any call's input_text contains text (substring match).

reset()

Clear all recorded calls and reset rule sequence counters.

reset_calls()

Clear recorded calls but keep rule sequence counters.

MockTool

MockTool(*, strict_validation=True, default_response=None)

A mock tool executor that returns preconfigured responses.

Register tools with their schemas and responses. When called, validates arguments against the schema, records the call, and returns the configured response.

Fluent API (recommended)::

tool = MockTool()
tool.on_call("get_weather").respond({"temp": 72, "unit": "F"})
result = await tool.call("get_weather", {"city": "NYC"})
assert result == {"temp": 72, "unit": "F"}

Classic API::

tool.register("get_weather", response={"temp": 72, "unit": "F"})

With schema validation::

tool.on_call("search").respond(
    {"results": []},
    schema={"properties": {"query": {"type": "string"}}, "required": ["query"]},
)

Sequential responses::

tool.register("roll_dice", response=[4, 2, 6])
# First call returns 4, second returns 2, third returns 6, then cycles

Return a list as-is (no cycling)::

tool.register("search", response=literal(["doc1", "doc2", "doc3"]))
# Every call returns ["doc1", "doc2", "doc3"]

calls property

All recorded calls.

call_count property

Total number of calls made.

last_call property

The most recent call, or None if no calls have been made.

registered_tools property

Names of all registered tools.

attach_faults(injector)

Wire a FaultInjector so faults fire automatically on tool calls.

When attached, call() and call_sync() check the injector before executing — no manual check_tool() guards needed.

::

tool.attach_faults(fault_injector)
fault_injector.on_tool("search").timeout(5)
await tool.call("search", {})  # raises ToolTimeoutError automatically

on_call(tool_name)

Start a fluent tool registration by specifying the tool name.

Returns a :class:_ToolCallMatcher whose .respond() or .error() method completes the registration.

::

tool.on_call("get_weather").respond({"temp": 72})
tool.on_call("bad_service").error("Service unavailable")

register(name, *, response=None, error=None, schema=None, description='')

Register a tool with its response and optional schema. Returns self for chaining.

call(name, arguments=None) async

Call a registered tool with the given arguments.

If a FaultInjector is attached, checks for faults first (async, supporting real delays for slow faults).

call_sync(name, arguments=None)

Synchronous version of call for non-async agents.

get_calls_for(tool_name)

Get all calls for a specific tool.

was_called(tool_name)

Check if a tool was called at least once.

assert_tool_called(tool_name, *, times=None, with_args=None)

Assert that a tool was called, optionally checking count and arguments.

Returns the first matching :class:ToolCallRecord. Raises AssertionError with a descriptive message on failure.

assert_tool_not_called(tool_name)

Assert that a tool was never called.

Raises AssertionError with a descriptive message on failure.

reset()

Clear all recorded calls and reset response sequence counters.

reset_calls()

Clear recorded calls but keep response sequence counters.

FaultInjector

FaultInjector()

Injects configurable faults into tool calls and LLM requests.

Use the fluent API to configure faults, then call check_tool() or check_llm() before each operation to see if a fault should fire.

Usage::

fault = FaultInjector()
fault.on_tool("search").timeout(5)
fault.on_llm().context_overflow()

# In agent/test code:
fault.check_tool("search")  # raises TimeoutError
fault.check_llm()           # raises ContextOverflowError

records property

All fault injection records (triggered and non-triggered).

triggered_records property

Only records where the fault actually fired.

trigger_count property

Number of times a fault was triggered.

triggered property

Whether any fault was triggered. Safe to use in if fi.triggered:.

on_tool(tool_name)

Configure a fault for a specific tool.

on_llm()

Configure a fault for LLM requests.

check_tool(tool_name)

Check if a fault should fire for this tool call. Raises on fault.

Call this before executing a tool. If a fault is configured and should trigger, raises the appropriate exception.

check_tool_async(tool_name) async

Async version of check_tool — supports slow fault with real delay.

check_llm()

Check if a fault should fire for this LLM call. Raises on fault.

check_llm_async() async

Async version of check_llm — supports slow fault with real delay.

was_triggered(target=None)

Check if any fault was triggered, optionally for a specific target.

has_faults_for(tool_name)

Check if any faults are configured for a tool.

has_llm_faults()

Check if any LLM faults are configured.

reset()

Clear all faults and records.

reset_records()

Clear records but keep fault configurations.

MockMCPServer

MockMCPServer(*, name='mock-mcp-server', version='1.0.0')

A mock MCP server that responds to JSON-RPC 2.0 messages.

Supports the core MCP protocol methods: - initialize — server handshake - notifications/initialized — client acknowledgment (no response) - tools/list — enumerate registered tools - tools/call — invoke a registered tool

Usage::

server = MockMCPServer(name="test-server")
server.register_tool("get_weather", response={"temp": 72})

# JSON-RPC message handling
resp = await server.handle_message({
    "jsonrpc": "2.0", "id": 1,
    "method": "tools/call",
    "params": {"name": "get_weather", "arguments": {"city": "NYC"}}
})
assert resp["result"]["content"][0]["text"] == '{"temp": 72}'

Batch and raw string handling::

resp = await server.handle_raw('{"jsonrpc":"2.0","id":1,"method":"tools/list"}')

Call recording and assertions::

server.assert_tool_called("get_weather")
assert server.call_count == 1

calls property

All recorded tool calls.

call_count property

Total number of tool calls made.

last_call property

Most recent tool call, or None.

registered_tools property

Names of all registered tools.

tool_definitions property

All tool definitions.

register_tool(name, *, response=None, error=None, description='', input_schema=None)

Register a tool that the server exposes. Returns self for chaining.

handle_message(message) async

Handle a single JSON-RPC 2.0 message. Returns response or None for notifications.

handle_raw(raw) async

Handle a raw JSON-RPC string. Returns JSON string response.

get_calls_for(tool_name)

Get all calls for a specific tool.

was_called(tool_name)

Check if a tool was called at least once.

assert_tool_called(tool_name, *, times=None, with_args=None)

Assert a tool was called, optionally checking count and arguments.

assert_tool_not_called(tool_name)

Assert that a tool was never called.

reset()

Clear all recorded calls and reset sequence counters.

reset_calls()

Clear recorded calls but keep response sequence counters.

Match Mode

MatchMode

Bases: str, Enum

How a pattern rule matches against input text.

Helper: literal

literal(value)

Wrap a value to prevent sequence cycling in MockTool/MockLLM.

By default, passing a list as a tool response causes MockTool to cycle through its elements on successive calls. Use literal() when you want to return the list itself::

# Without literal — cycles: first call → "doc1", second → "doc2"
tool.register("search", response=["doc1", "doc2"])

# With literal — always returns ["doc1", "doc2"]
tool.register("search", response=literal(["doc1", "doc2"]))