feat: LLM Router extension for cost-optimized model selection by bsbodden · Pull Request #476 · redis/redis-vl-python

bsbodden · 2026-02-16T22:27:14Z

Adds LLMRouter and AsyncLLMRouter — a new RedisVL extension that routes queries to the cheapest LLM capable of handling them using Redis vector search. This is
the natural complement to SemanticCache/LangCache: caching eliminates redundant calls, routing optimizes the calls you must make.

"hello, how are you?" → GPT-4.1 Nano ($0.10/M tokens)
"explain garbage collection" → Claude Sonnet 4.5 ($3/M tokens)
"architect a distributed system" → Claude Opus 4.5 ($5/M tokens)

Why this matters

Enterprise LLM spend reached $8.4B (Menlo Ventures, mid-2025) and 53% of AI teams exceed cost forecasts by 40%+. The root cause: every query hits the most
expensive model. Academic research (RouteLLM/ICLR 2025, FrugalGPT/Stanford) shows 30-85% cost savings from intelligent routing. A funded startup ecosystem
validates the category — OpenRouter ($500M valuation, $40M raised), Martian (Accenture-backed), NotDiamond (IBM/SAP-backed), Unify (YC/Microsoft-backed).

RedisVL's LLM Router is the first open-source, Redis-native, self-hosted, multi-tier routing solution. Combined with LangCache/SemanticCache, it forms a
complete cost optimization stack no competitor offers.

Key features

Pretrained config: Ships with a 3-tier Bloom's Taxonomy config (simple/standard/expert) with 18 reference phrases per tier and pre-computed embeddings — zero
setup required
Cost-aware routing: Optional cost penalty biases toward cheaper tiers when distances are close
LiteLLM-compatible: Model strings (provider/model) work directly with LiteLLM's 100+ providers
Per-tier thresholds: Each tier has independent distance thresholds for fine-grained control
Full async support: AsyncLLMRouter with create() factory pattern
Portable configs: Export/import routers with pre-computed embeddings via export_with_embeddings() / from_pretrained()

Adds intelligent LLM model routing using semantic similarity: - ModelTier: Define model tiers with references and thresholds - LLMRouter: Route queries to optimal model tier - LLMRouteMatch: Routing result with tier, model, confidence - Cost optimization: Prefer cheaper tiers when distances close - Pretrained support: Export/import with pre-computed embeddings Integration tests define expected behavior (test-first approach). Part of redis-vl-python enhancement for intelligent LLM auto-selection.

Tests for: - ModelTier validation (name, model, references, threshold bounds) - LLMRouteMatch (truthy/falsy, alternatives, metadata) - RoutingConfig (defaults, custom values, bounds) - Pretrained schemas (reference, tier, config) - DistanceAggregationMethod enum

- Fix from_pretrained() to use model_construct() instead of object.__new__() - Update test_cost_optimization_prefers_cheaper to use matching query - Update test_add_tier_references to verify references added correctly - Add tests/unit/conftest.py to skip Docker fixtures for unit tests - Add tests/integration/conftest.py to use local Redis when available

- test_add_tier_references now verifies reference addition without strict routing - Cost optimization test uses query that better matches references - All 22 integration tests should now pass

- Problem statement and existing solution limitations - Architecture diagrams and key design decisions - API examples and comparison with SemanticRouter - Testing guide and future enhancements

…eddings Add a built-in 3-tier pretrained configuration (simple/standard/expert) grounded in Bloom's Taxonomy with 18 reference phrases per tier and pre-computed embeddings from sentence-transformers/all-mpnet-base-v2. Includes generation script and pretrained loader for named configs.

Add AsyncLLMRouter with async factory pattern (create() classmethod), mirroring all sync LLMRouter functionality with async I/O. Update module exports and correct simple tier model to openai/gpt-4.1-nano for accurate cost optimization.

Add comprehensive async integration tests mirroring all sync tests with AsyncLLMRouter.create() factory. Add pretrained config tests for default 3-tier routing. Update model references and pricing assertions to match corrected tier definitions.

Add comprehensive Jupyter notebook (13_llm_router.ipynb) covering pretrained routing, custom tiers, cost optimization, tier management, serialization, and async usage. Update DESIGN.md with async support, pretrained config details, and corrected model pricing.

Copilot

Pull request overview

This PR introduces an LLM Router extension for RedisVL that enables cost-optimized model selection through semantic routing. The router uses Redis vector search to match queries to model tiers based on semantic similarity to reference phrases, allowing applications to route simple queries to cheaper models and complex queries to more capable (expensive) models.

Changes:

New LLMRouter and AsyncLLMRouter classes for intelligent model tier selection
Pretrained configuration system with built-in "default" config featuring 3 tiers (simple/standard/expert)
Comprehensive test suite including unit tests and integration tests for both sync and async implementations

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`redisvl/extensions/llm_router/router.py`	Core implementation of sync and async LLM routers with routing logic and tier management
`redisvl/extensions/llm_router/schema.py`	Pydantic models for ModelTier, LLMRouteMatch, RoutingConfig, and pretrained configurations
`redisvl/extensions/llm_router/__init__.py`	Public API exports for the extension
`redisvl/extensions/llm_router/pretrained/__init__.py`	Loader for pretrained router configurations
`scripts/generate_pretrained_config.py`	Script to generate pretrained configs with embedded reference vectors
`tests/unit/test_llm_router_schema.py`	Unit tests for schema validation and Pydantic models
`tests/unit/conftest.py`	Test configuration to allow unit tests without Docker/Redis
`tests/integration/test_llm_router.py`	Integration tests for sync LLMRouter functionality
`tests/integration/test_async_llm_router.py`	Integration tests for async AsyncLLMRouter functionality
`tests/integration/conftest.py`	Configuration for integration tests with optional Docker override
`redisvl/extensions/llm_router/DESIGN.md`	Comprehensive design documentation
`docs/user_guide/13_llm_router.ipynb`	User guide notebook with examples and usage patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

redisvl/extensions/llm_router/router.py

…assmethods The from_pretrained and from_existing methods (sync and async) ignored a provided redis_client because redis_url defaults to "redis://localhost:6379" and was always truthy. This caused ConnectionRefusedError in CI where Redis runs on a dynamic testcontainer port.

- Validate threshold range (0, 2] in update_tier_threshold before assignment, matching the ModelTier Pydantic schema constraint. - Guard _get_tier_matches against empty tiers list to prevent ValueError from max() on empty sequence. Applied to both sync and async implementations.

Copilot

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

redisvl/extensions/llm_router/router.py

redisvl/extensions/llm_router/schema.py

redisvl/extensions/llm_router/router.py

vishal-bala

Just a quick glance through for now!

redisvl/extensions/llm_router/DESIGN.md

redisvl/extensions/llm_router/router.py

redisvl/extensions/llm_router/schema.py

bsbodden added 9 commits February 16, 2026 13:26

test(llm-router): simplify test assertions for semantic matching

1b7b0e1

- test_add_tier_references now verifies reference addition without strict routing - Cost optimization test uses query that better matches references - All 22 integration tests should now pass

docs(llm-router): add comprehensive DESIGN.md

91e8c99

- Problem statement and existing solution limitations - Architecture diagrams and key design decisions - API examples and comparison with SemanticRouter - Testing guide and future enhancements

Copilot AI review requested due to automatic review settings February 16, 2026 22:27

Copilot started reviewing on behalf of bsbodden February 16, 2026 22:27 View session

bsbodden force-pushed the llm-router branch from 0c13644 to fda6eb6 Compare February 16, 2026 22:31

Copilot AI reviewed Feb 16, 2026

View reviewed changes

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

bsbodden added the experimental label Feb 16, 2026

bsbodden requested review from abrookins and tylerhutcherson February 16, 2026 23:19

Copilot AI review requested due to automatic review settings February 17, 2026 00:45

Copilot started reviewing on behalf of bsbodden February 17, 2026 00:45 View session

bsbodden self-assigned this Feb 17, 2026

Copilot AI reviewed Feb 17, 2026

View reviewed changes

vishal-bala reviewed Feb 17, 2026

View reviewed changes

bsbodden requested review from rbs333 and removed request for abrookins and tylerhutcherson February 25, 2026 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM Router extension for cost-optimized model selection#476

feat: LLM Router extension for cost-optimized model selection#476
bsbodden wants to merge 11 commits intomainfrom
llm-router

bsbodden commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vishal-bala left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bsbodden commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vishal-bala left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants