Models and Providers

MultiRoute routes requests to many underlying AI models. To keep configuration simple and consistent, it uses two core concepts: providers and models.

This guide explains how to think about each and how to choose models for your workloads.

Providers

A provider is an upstream system that serves AI models. From MultiRoute’s perspective, a provider has:

A set of supported models.
Authentication and billing specifics.
Operational characteristics such as reliability and latency.

You typically do not need to integrate with providers directly. Instead:

You configure which providers are available to your project.
MultiRoute handles authentication, routing, and error handling behind the /openai/v1 API.

Models

A model is a specific capability exposed by a provider, such as a general-purpose chat model, a reasoning-oriented model, or an image generator.

Models differ along multiple dimensions:

Quality: How strong the model is at following instructions, reasoning, and generating high-quality outputs.
Cost: How much each request or token costs.
Latency: How quickly the model responds.
Capabilities: For example, whether it supports tools, images, or long contexts.

MultiRoute represents models in a normalized way so you can configure routing policies without binding your application to a specific provider’s naming conventions.

Choosing Models: Tradeoffs

When selecting models for a project or routing profile, consider:

Quality vs cost:
- Higher-quality models are often more expensive but may reduce the need for retries or post-processing.
- Lower-cost models can be ideal for high-volume or less critical use cases.
Latency vs depth:
- Lower-latency models are better for interactive user interfaces.
- Slower, more capable models may be appropriate for offline or batch processing.
Specialization:
- Some models are optimized for coding, structured outputs, or reasoning.
- Others are more general-purpose and suitable for a wide variety of tasks.

MultiRoute is designed to let you mix and match:

Use a fast, cost-efficient model for most traffic.
Route edge cases or high-value operations to more capable models.
Introduce fallback models to absorb traffic during outages or spikes.

Organizing Models by Use Case

A practical way to think about models is by use case:

General chat and assistants: For conversational interfaces, knowledge retrieval, and everyday tasks.
Reasoning and complex workflows: For multi-step reasoning, planning, or tasks that require deeper understanding.
Code and tooling: For code generation, refactoring, and tool-assisted workflows.
Images and multimodal: For generating or understanding images and mixed text/image inputs.

The docs/models/index.md page provides a high-level catalog structure that can be filled in with specific models over time.

Binding Applications to Models

Your application can:

Explicitly request a model when you need a particular behavior.
Reference a routing profile that encapsulates a set of models and priorities, allowing the configuration to evolve without code changes.

The recommended pattern is:

Use routing profiles and configuration for production workloads where you want flexibility over time.
Use explicit models for experiments, benchmarks, or very specific behaviors.

This approach balances control with adaptability as new models become available.