Responses
Overview
The responses endpoint provides a higher-level interface on top of MultiRoute's lower-level completion APIs.
Instead of working directly with /v1/chat/completions or other primitive endpoints, you can call /v1/responses
to let MultiRoute handle prompt formatting, tool calling (when enabled), and response shaping for you.
This is useful when:
- You want a single, consistent entrypoint for text responses, regardless of the underlying model provider.
- You want MultiRoute to manage conversation state, tools, or structured outputs in a uniform way.
- You prefer a simpler request shape that focuses on your user inputs and desired behavior rather than raw model parameters.
The exact fields available on
/v1/responsesmay evolve as MultiRoute adds more orchestration features.
Always consult this page and your dashboard release notes for the most up-to-date contract.
Endpoint
- Method:
POST - Path:
/v1/responses - Base URL:
https://api.multiroute.ai/v1
Full URL:
https://api.multiroute.ai/v1/responses
Authentication is required via the Authorization: Bearer <your-api-key> header.
See Authentication for details.
Request body
The /v1/responses endpoint accepts a JSON body that closely follows the OpenAI-style Responses schema implemented in
ResponsesRequest. All of the following fields are accepted; required ones are noted explicitly.
Core fields
-
model(string, required)
Identifier of the model or routing alias to use, for example:"openai/gpt-4o-mini"or"multiroute-chat-latest". -
input(string or array, required)
The user input. This can be:- A plain string, treated as a single text input, or
- An array of input items such as messages or explicit
input_text/input_imageobjects.
-
instructions(string, optional)
High-level system/developer instructions. This can be a template that references values inmetadata["variables"]. -
text(object, optional)
Configuration for how text outputs should be formatted. This wraps a response format object, for example:{ "format": { "type": "text" } }{ "format": { "type": "json_schema", "name": "...", "schema": { ... }, "strict": true } }
-
tools(array, optional)
List of tool definitions the model may call. Each item is a JSON object describing a tool; tool calling behavior is
determined by the underlying model and routing configuration. -
stream(boolean, optional, defaultfalse)
Iftrue, the response is streamed as Server-Sent Events (SSE) using the Responses streaming schema. -
temperature(number, optional)
Sampling temperature in the range [0, 2]. Higher values produce more diverse outputs. -
max_output_tokens(integer, optional)
Upper bound on the number of tokens generated in the response. -
previous_response_id(string, optional)
Used to continue or branch from a previous response in the same conversation. -
metadata(object, optional)
Arbitrary key-value metadata attached to the request. MultiRoute uses this for routing (for example,user_id) and
observability. Values frommetadatamay also be used to fill variables ininstructions.
Example request body
{
"input": "Give me three ideas for how to document a new API.",
"model": "multiroute-chat-latest",
"instructions": "You are a concise technical writer. Answer clearly and in bullet points.",
"temperature": 0.4,
"max_output_tokens": 256,
"metadata": {
"conversation_id": "docs-example-123"
}
}
Response body
Non-streaming requests ("stream": false or omitted) return a JSON object that matches the ResponsesResponse schema:
-
id(string)
Unique identifier for the response (for example,resp_...). -
object(string)
Always"response"for this endpoint. -
created_at(integer)
Unix timestamp (seconds) when the response was created. -
model(string)
The model that actually generated the output. This may be a resolved provider-specific model ID even if you requested a routing alias. -
output(array)
Ordered list of output items, which can include:- Message items with:
type: "message"role: "assistant"status:"in_progress","completed", or"incomplete"content: an array of content blocks such as:{ "type": "output_text", "text": "...", "annotations": [], "logprobs": [] }
- Function/tool call items with:
type: "function_call"call_id,name, and JSON-encodedarguments
- Function call output items with:
type: "function_call_output"call_idandoutput(string or array)
- Message items with:
-
usage(object, optional)
Token usage summary (compatible with other MultiRoute endpoints):prompt_tokenscompletion_tokenstotal_tokens
Example response
{
"id": "resp_abc123",
"object": "response",
"created_at": 1710000000,
"model": "multiroute-chat-latest",
"output": [
{
"id": "msg_1",
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Here are three ideas for documenting a new API: 1) a quickstart, 2) a detailed reference, and 3) a troubleshooting guide.",
"annotations": [],
"logprobs": []
}
]
}
],
"usage": {
"prompt_tokens": 32,
"completion_tokens": 84,
"total_tokens": 116
}
}
Streaming responses
When "stream": true is set on the request, MultiRoute streams Server-Sent Events (SSE) that follow the
ResponseStreamEvent schema. Each event is a JSON object with:
object:"response.output_item.delta"for incremental updates, or"response.completed"when the response is done.id: optional event or output item ID.output_index: index of the output item this delta applies to.delta: partial output item data (for example, a content chunk).usage: optional token usage (typically present on completion).
The HTTP response uses text/event-stream with lines of the form:
data: { ...json event... }
Your client should read these events incrementally, apply deltas to reconstruct the final output items, and stop when it
sees an event with object: "response.completed".
Examples
import os
import requests
API_KEY = os.environ.get("MULTIROUTE_API_KEY")
def create_response():
url = "https://api.multiroute.ai/v1/responses"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
json_body = {
"input": "List three production best practices for calling MultiRoute.",
"model": "multiroute-chat-latest",
"instructions": "Answer in bullet points.",
"temperature": 0.3,
"max_output_tokens": 256,
"metadata": {
"conversation_id": "integration-guide-py"
},
}
resp = requests.post(url, headers=headers, json=json_body, timeout=30)
resp.raise_for_status()
data = resp.json()
print(data["output"]["content"])
if __name__ == "__main__":
create_response()
const apiKey = process.env.MULTIROUTE_API_KEY!;
async function createResponse() {
const response = await fetch("https://api.multiroute.ai/v1/responses", {
method: "POST",
headers: {
"Authorization": `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "Summarize the benefits of using a routing layer over direct model access.",
model: "multiroute-chat-latest",
instructions: "Write for a backend engineer.",
temperature: 0.3,
max_output_tokens: 256,
metadata: {
conversation_id: "integration-guide-ts"
}
}),
});
if (!response.ok) {
const errorBody = await response.text();
throw new Error(`Request failed: ${response.status} ${errorBody}`);
}
const data = await response.json();
console.log(data.output?.content);
}
createResponse().catch(console.error);
curl https://api.multiroute.ai/v1/responses \
-H "Authorization: Bearer $MULTIROUTE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Explain what MultiRoute does in two sentences.",
"model": "multiroute-chat-latest",
"instruction": "Use clear technical language.",
"temperature": 0.2
}'