Models and Providers
MultiRoute routes requests to many underlying AI models. To keep configuration simple and consistent, it uses two core concepts: providers and models.
This guide explains how to think about each and how to choose models for your workloads.
Providers
A provider is an upstream system that serves AI models. From MultiRoute’s perspective, a provider has:
- A set of supported models.
- Authentication and billing specifics.
- Operational characteristics such as reliability and latency.
You typically do not need to integrate with providers directly. Instead:
- You configure which providers are available to your project.
- MultiRoute handles authentication, routing, and error handling behind the
/v1API.
Models
A model is a specific capability exposed by a provider, such as a general-purpose chat model, a reasoning-oriented model, or an image generator.
Models differ along multiple dimensions:
- Quality: How strong the model is at following instructions, reasoning, and generating high-quality outputs.
- Cost: How much each request or token costs.
- Latency: How quickly the model responds.
- Capabilities: For example, whether it supports tools, images, or long contexts.
MultiRoute represents models in a normalized way so you can configure routing policies without binding your application to a specific provider’s naming conventions.
Choosing Models: Tradeoffs
When selecting models for a project or routing profile, consider:
- Quality vs cost:
- Higher-quality models are often more expensive but may reduce the need for retries or post-processing.
- Lower-cost models can be ideal for high-volume or less critical use cases.
- Latency vs depth:
- Lower-latency models are better for interactive user interfaces.
- Slower, more capable models may be appropriate for offline or batch processing.
- Specialization:
- Some models are optimized for coding, structured outputs, or reasoning.
- Others are more general-purpose and suitable for a wide variety of tasks.
MultiRoute is designed to let you mix and match:
- Use a fast, cost-efficient model for most traffic.
- Route edge cases or high-value operations to more capable models.
- Introduce fallback models to absorb traffic during outages or spikes.
Organizing Models by Use Case
A practical way to think about models is by use case:
- General chat and assistants: For conversational interfaces, knowledge retrieval, and everyday tasks.
- Reasoning and complex workflows: For multi-step reasoning, planning, or tasks that require deeper understanding.
- Code and tooling: For code generation, refactoring, and tool-assisted workflows.
- Images and multimodal: For generating or understanding images and mixed text/image inputs.
The docs/models/index.md page provides a high-level catalog structure that can be filled in with specific models over time.
Binding Applications to Models
Your application can:
- Explicitly request a model when you need a particular behavior.
- Reference a routing profile that encapsulates a set of models and priorities, allowing the configuration to evolve without code changes.
The recommended pattern is:
- Use routing profiles and configuration for production workloads where you want flexibility over time.
- Use explicit models for experiments, benchmarks, or very specific behaviors.
This approach balances control with adaptability as new models become available.