Smart Routing

High-level Smart Routing flow

Smart Routing is MultiRoute’s routing engine. It sits between your application and multiple AI model providers and decides, for every request, which provider and model to use based on your preferences (cost, latency, quality, or a balanced strategy) while handling timeouts, retries, and failover on your behalf.

This guide explains how Smart Routing works so you can design for reliability and performance without hard-coding provider-specific logic into your app.

What Smart Routing Does

When your application sends a request to a /openai/v1 endpoint, Smart Routing:

  1. Identifies the project and API key.
  2. Loads the active routing configuration for that project and endpoint (including Smart Routing settings).
  3. Evaluates candidate providers and models against your chosen strategy (cost, latency, quality, or balanced).
  4. Selects the best provider and model for this request.
  5. Applies timeouts and retries according to your failover settings.
  6. Falls back to alternate models or providers when the primary choice is unhealthy or failing.
  7. Returns the response (or an error) back to your application.

Routing decisions can depend on:

Strategies and Preferences

Smart Routing supports multiple high-level strategies:

Behind the scenes, Smart Routing maintains evaluation weights that control how much each dimension contributes to the final score. For example:

You configure these preferences in the settings UI or via the settings API. Once configured, you do not need to change your application code—Smart Routing continuously applies your chosen strategy to each request.

Inputs to Routing Decisions

For a given request, Smart Routing considers:

The combination of configuration, request context, and operational signals allows Smart Routing to choose sensible defaults while adapting to real-world conditions.

Timeouts

Every request to an underlying provider is subject to a timeout. The timeout limits how long Smart Routing will wait for a response from a chosen provider before treating that attempt as failed.

Timeouts can be:

Choosing timeouts involves a tradeoff between:

In general:

Retries

When a request to a provider fails with a retryable error (such as a transient network issue), Smart Routing may retry the request according to your configured policy.

Retries are usually:

Retries add resilience, but also:

Use retries primarily for transient failures, not as a substitute for fixing invalid requests or quotas.

Failover and Fallbacks

If a provider or model is repeatedly failing or unavailable, Smart Routing can fail over to an alternate model according to your routing configuration and Smart Routing preferences.

Fallback behavior typically includes:

You control:

For critical workloads, configure at least one viable fallback model so interruptions from a single provider are minimized.

Consistency and Idempotency

Retries and failovers can result in the same logical request being sent to one or more providers. To make this safe:

Smart Routing focuses on maximizing availability and responsiveness; your application should account for this when building workflows that must be strictly consistent.

Observing Smart Routing Behavior

To understand how Smart Routing behaves in practice:

Combining clear Smart Routing configuration with good observability gives you a predictable and debuggable system, even as you evolve your provider and model mix over time.