Responses

Overview

The responses endpoint provides a higher-level interface on top of MultiRoute's lower-level completion APIs.
Instead of working directly with /openai/v1/chat/completions or other primitive endpoints, you can call /openai/v1/responses
to let MultiRoute handle prompt formatting, tool calling (when enabled), and response shaping for you.

This is useful when:

You want a single, consistent entrypoint for text responses, regardless of the underlying model provider.
You want MultiRoute to manage conversation state, tools, or structured outputs in a uniform way.
You prefer a simpler request shape that focuses on your user inputs and desired behavior rather than raw model parameters.

The exact fields available on /openai/v1/responses may evolve as MultiRoute adds more orchestration features.
Always consult this page and your dashboard release notes for the most up-to-date contract.

Endpoint

Method: POST
Path: /openai/v1/responses
Base URL: https://api.multiroute.ai/openai/v1

Full URL:

https://api.multiroute.ai/openai/v1/responses

Authentication is required via the Authorization: Bearer <your-api-key> header.
See Authentication for details.

Request body

The /openai/v1/responses endpoint accepts a JSON body that closely follows the OpenAI-style Responses schema implemented in
ResponsesRequest. All of the following fields are accepted; required ones are noted explicitly.

Core fields

model (string, required)
Identifier of the model or routing alias to use, for example: "openai/gpt-4o-mini" or "multiroute-chat-latest".
input (string or array, required)
The user input. This can be:
- A plain string, treated as a single text input, or
- An array of input items such as messages or explicit input_text / input_image objects.
instructions (string, optional)
High-level system/developer instructions. This can be a template that references values in metadata["variables"].
text (object, optional)
Configuration for how text outputs should be formatted. This wraps a response format object, for example:
- { "format": { "type": "text" } }
- { "format": { "type": "json_schema", "name": "...", "schema": { ... }, "strict": true } }
tools (array, optional)
List of tool definitions the model may call. Each item is a JSON object describing a tool; tool calling behavior is
determined by the underlying model and routing configuration.
stream (boolean, optional, default false)
If true, the response is streamed as Server-Sent Events (SSE) using the Responses streaming schema.
temperature (number, optional)
Sampling temperature in the range [0, 2]. Higher values produce more diverse outputs.
max_output_tokens (integer, optional)
Upper bound on the number of tokens generated in the response.
previous_response_id (string, optional)
Used to continue or branch from a previous response in the same conversation.
metadata (object, optional)
Arbitrary key-value metadata attached to the request. MultiRoute uses this for routing (for example, user_id) and
observability. Values from metadata may also be used to fill variables in instructions.

Example request body

{
  "input": "Give me three ideas for how to document a new API.",
  "model": "multiroute-chat-latest",
  "instructions": "You are a concise technical writer. Answer clearly and in bullet points.",
  "temperature": 0.4,
  "max_output_tokens": 256,
  "metadata": {
    "conversation_id": "docs-example-123"
  }
}

Response body

Non-streaming requests ("stream": false or omitted) return a JSON object that matches the ResponsesResponse schema:

id (string)
Unique identifier for the response (for example, resp_...).
object (string)
Always "response" for this endpoint.
created_at (integer)
Unix timestamp (seconds) when the response was created.
model (string)
The model that actually generated the output. This may be a resolved provider-specific model ID even if you requested a routing alias.
output (array)
Ordered list of output items, which can include:
- Message items with:
  - type: "message"
  - role: "assistant"
  - status: "in_progress", "completed", or "incomplete"
  - content: an array of content blocks such as:
    - { "type": "output_text", "text": "...", "annotations": [], "logprobs": [] }
- Function/tool call items with:
  - type: "function_call"
  - call_id, name, and JSON-encoded arguments
- Function call output items with:
  - type: "function_call_output"
  - call_id and output (string or array)
usage (object, optional)
Token usage summary (compatible with other MultiRoute endpoints):
- prompt_tokens
- completion_tokens
- total_tokens

Example response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1710000000,
  "model": "multiroute-chat-latest",
  "output": [
    {
      "id": "msg_1",
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Here are three ideas for documenting a new API: 1) a quickstart, 2) a detailed reference, and 3) a troubleshooting guide.",
          "annotations": [],
          "logprobs": []
        }
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 84,
    "total_tokens": 116
  }
}

Streaming responses

When "stream": true is set on the request, MultiRoute streams Server-Sent Events (SSE) that follow the
ResponseStreamEvent schema. Each event is a JSON object with:

object: "response.output_item.delta" for incremental updates, or "response.completed" when the response is done.
id: optional event or output item ID.
output_index: index of the output item this delta applies to.
delta: partial output item data (for example, a content chunk).
usage: optional token usage (typically present on completion).

The HTTP response uses text/event-stream with lines of the form:

data: { ...json event... }

Your client should read these events incrementally, apply deltas to reconstruct the final output items, and stop when it
sees an event with object: "response.completed".

Examples

import os
import requests

API_KEY = os.environ.get("MULTIROUTE_API_KEY")

def create_response():
  url = "https://api.multiroute.ai/openai/v1/responses"
  headers = {
      "Authorization": f"Bearer {API_KEY}",
      "Content-Type": "application/json",
  }
  json_body = {
      "input": "List three production best practices for calling MultiRoute.",
      "model": "multiroute-chat-latest",
      "instructions": "Answer in bullet points.",
      "temperature": 0.3,
      "max_output_tokens": 256,
      "metadata": {
          "conversation_id": "integration-guide-py"
      },
  }

  resp = requests.post(url, headers=headers, json=json_body, timeout=30)
  resp.raise_for_status()
  data = resp.json()
  print(data["output"]["content"])

if __name__ == "__main__":
  create_response()

const apiKey = process.env.MULTIROUTE_API_KEY!;

async function createResponse() {
  const response = await fetch("https://api.multiroute.ai/openai/v1/responses", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      input: "Summarize the benefits of using a routing layer over direct model access.",
      model: "multiroute-chat-latest",
      instructions: "Write for a backend engineer.",
      temperature: 0.3,
      max_output_tokens: 256,
      metadata: {
        conversation_id: "integration-guide-ts"
      }
    }),
  });

  if (!response.ok) {
    const errorBody = await response.text();
    throw new Error(`Request failed: ${response.status} ${errorBody}`);
  }

  const data = await response.json();
  console.log(data.output?.content);
}

createResponse().catch(console.error);

curl https://api.multiroute.ai/openai/v1/responses \
  -H "Authorization: Bearer $MULTIROUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Explain what MultiRoute does in two sentences.",
    "model": "multiroute-chat-latest",
    "instruction": "Use clear technical language.",
    "temperature": 0.2
  }'