Routing and Fallbacks

MultiRoute sits between your application and multiple AI model providers, routing each request to an appropriate model while handling timeouts, retries, and failover on your behalf.

This guide explains the core concepts so you can design for reliability without hard-coding provider-specific logic into your app.

Routing Basics

When your application sends a request to a /v1 endpoint, MultiRoute:

Identifies the project and API key.
Looks up the active routing configuration for that project and endpoint.
Selects a model and provider according to your configured priorities and weights.
Applies timeouts and retries as needed.
Returns the response (or an error) back to your application.

Routing decisions can depend on:

The endpoint (e.g. /v1/chat/completions).
The requested model or routing profile.
Project-level defaults and environment (dev, staging, prod).

Model Selection Strategies

Common strategies for choosing models include:

Single primary model: All traffic goes to one model that you have selected for a workload.
Weighted distribution: Traffic is split between multiple models based on assigned weights (e.g. 80/20).
Canary or A/B testing: A small percentage of traffic is routed to a new model to evaluate performance before a full rollout.

These strategies are expressed in configuration (via /v1/config or the dashboard routing UI), not in application code. This keeps your application logic simple and makes it easy to change models over time.

Timeouts

Every request to an underlying provider is subject to a timeout. The timeout limits how long MultiRoute will wait for a response before treating the attempt as failed.

Timeouts can be:

Global defaults: Applied to all requests unless overridden.
Per-profile or per-model overrides: For example, allowing longer timeouts for more complex reasoning tasks.

Choosing timeouts involves a tradeoff between:

User experience: Long timeouts can lead to slow responses.
Accuracy and completeness: Some models may need more time for complex tasks.

In general:

For UI-bound calls, prefer shorter timeouts and aggressive failover.
For offline or batch jobs, you can often tolerate longer timeouts.

Retries

When a request to a provider fails with a retryable error (such as a transient network issue), MultiRoute may retry the request according to your configured policy.

Retries are usually:

Limited in count (e.g. a small number of attempts).
Guarded by overall timeout budgets so that retries do not cause unbounded delays.
Scoped to specific error types; for example, MultiRoute will not retry on validation errors that are caused by an invalid prompt or parameters.

Retries add resilience, but also:

Increase total latency when providers are degraded.
Can increase underlying provider usage if not configured carefully.

Use retries primarily for transient failures, not as a substitute for fixing invalid requests or quotas.

Failover and Fallbacks

If a provider or model is repeatedly failing or unavailable, MultiRoute can fail over to an alternate model according to your routing configuration.

Fallback behavior typically includes:

Promoting a secondary model to primary when the primary is unhealthy.
Skipping models that are returning persistent errors for the current request.
Routing to backup providers if your main provider is experiencing an outage.

You control:

Which models are eligible as fallbacks.
Whether failover is allowed for a given profile.
How quickly MultiRoute should give up on a failing model.

For critical workloads, configure at least one viable fallback model so interruptions from a single provider are minimized.

Consistency and Idempotency

Retries and failovers can result in the same logical request being sent to one or more providers. To make this safe:

Design requests to be idempotent where possible.
Use idempotency keys or unique request identifiers in your own systems when an operation has side effects.
Be aware that different models may produce different outputs for the same prompt if a fallback is used.

MultiRoute focuses on maximizing availability and responsiveness; your application should account for this when building workflows that must be strictly consistent.

Observing Routing Behavior

To understand how routing and fallbacks behave in practice:

Use the dashboard logs and metrics to see which models handled each request.
Track request IDs in your logs to diagnose when fallbacks were triggered.
Compare latency and error rates across routing profiles when tuning timeouts and retries.

Combining clear routing configuration with good observability gives you a predictable and debuggable system, even as you evolve your model mix over time.