Skip to content

feat: LLM Router extension for cost-optimized model selection#476

Open
bsbodden wants to merge 11 commits intomainfrom
llm-router
Open

feat: LLM Router extension for cost-optimized model selection#476
bsbodden wants to merge 11 commits intomainfrom
llm-router

Conversation

@bsbodden
Copy link
Collaborator

Adds LLMRouter and AsyncLLMRouter — a new RedisVL extension that routes queries to the cheapest LLM capable of handling them using Redis vector search. This is
the natural complement to SemanticCache/LangCache: caching eliminates redundant calls, routing optimizes the calls you must make.

  • "hello, how are you?" → GPT-4.1 Nano ($0.10/M tokens)
  • "explain garbage collection" → Claude Sonnet 4.5 ($3/M tokens)
  • "architect a distributed system" → Claude Opus 4.5 ($5/M tokens)

Why this matters

Enterprise LLM spend reached $8.4B (Menlo Ventures, mid-2025) and 53% of AI teams exceed cost forecasts by 40%+. The root cause: every query hits the most
expensive model. Academic research (RouteLLM/ICLR 2025, FrugalGPT/Stanford) shows 30-85% cost savings from intelligent routing. A funded startup ecosystem
validates the category — OpenRouter ($500M valuation, $40M raised), Martian (Accenture-backed), NotDiamond (IBM/SAP-backed), Unify (YC/Microsoft-backed).

RedisVL's LLM Router is the first open-source, Redis-native, self-hosted, multi-tier routing solution. Combined with LangCache/SemanticCache, it forms a
complete cost optimization stack no competitor offers.

Key features

  • Pretrained config: Ships with a 3-tier Bloom's Taxonomy config (simple/standard/expert) with 18 reference phrases per tier and pre-computed embeddings — zero
    setup required
  • Cost-aware routing: Optional cost penalty biases toward cheaper tiers when distances are close
  • LiteLLM-compatible: Model strings (provider/model) work directly with LiteLLM's 100+ providers
  • Per-tier thresholds: Each tier has independent distance thresholds for fine-grained control
  • Full async support: AsyncLLMRouter with create() factory pattern
  • Portable configs: Export/import routers with pre-computed embeddings via export_with_embeddings() / from_pretrained()

Adds intelligent LLM model routing using semantic similarity:

- ModelTier: Define model tiers with references and thresholds
- LLMRouter: Route queries to optimal model tier
- LLMRouteMatch: Routing result with tier, model, confidence
- Cost optimization: Prefer cheaper tiers when distances close
- Pretrained support: Export/import with pre-computed embeddings

Integration tests define expected behavior (test-first approach).

Part of redis-vl-python enhancement for intelligent LLM auto-selection.
Tests for:
- ModelTier validation (name, model, references, threshold bounds)
- LLMRouteMatch (truthy/falsy, alternatives, metadata)
- RoutingConfig (defaults, custom values, bounds)
- Pretrained schemas (reference, tier, config)
- DistanceAggregationMethod enum
- Fix from_pretrained() to use model_construct() instead of object.__new__()
- Update test_cost_optimization_prefers_cheaper to use matching query
- Update test_add_tier_references to verify references added correctly
- Add tests/unit/conftest.py to skip Docker fixtures for unit tests
- Add tests/integration/conftest.py to use local Redis when available
- test_add_tier_references now verifies reference addition without strict routing
- Cost optimization test uses query that better matches references
- All 22 integration tests should now pass
- Problem statement and existing solution limitations
- Architecture diagrams and key design decisions
- API examples and comparison with SemanticRouter
- Testing guide and future enhancements
…eddings

Add a built-in 3-tier pretrained configuration (simple/standard/expert)
grounded in Bloom's Taxonomy with 18 reference phrases per tier and
pre-computed embeddings from sentence-transformers/all-mpnet-base-v2.

Includes generation script and pretrained loader for named configs.
Add AsyncLLMRouter with async factory pattern (create() classmethod),
mirroring all sync LLMRouter functionality with async I/O. Update
module exports and correct simple tier model to openai/gpt-4.1-nano
for accurate cost optimization.
Add comprehensive async integration tests mirroring all sync tests
with AsyncLLMRouter.create() factory. Add pretrained config tests
for default 3-tier routing. Update model references and pricing
assertions to match corrected tier definitions.
Add comprehensive Jupyter notebook (13_llm_router.ipynb) covering
pretrained routing, custom tiers, cost optimization, tier management,
serialization, and async usage. Update DESIGN.md with async support,
pretrained config details, and corrected model pricing.
Copilot AI review requested due to automatic review settings February 16, 2026 22:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an LLM Router extension for RedisVL that enables cost-optimized model selection through semantic routing. The router uses Redis vector search to match queries to model tiers based on semantic similarity to reference phrases, allowing applications to route simple queries to cheaper models and complex queries to more capable (expensive) models.

Changes:

  • New LLMRouter and AsyncLLMRouter classes for intelligent model tier selection
  • Pretrained configuration system with built-in "default" config featuring 3 tiers (simple/standard/expert)
  • Comprehensive test suite including unit tests and integration tests for both sync and async implementations

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
redisvl/extensions/llm_router/router.py Core implementation of sync and async LLM routers with routing logic and tier management
redisvl/extensions/llm_router/schema.py Pydantic models for ModelTier, LLMRouteMatch, RoutingConfig, and pretrained configurations
redisvl/extensions/llm_router/__init__.py Public API exports for the extension
redisvl/extensions/llm_router/pretrained/__init__.py Loader for pretrained router configurations
scripts/generate_pretrained_config.py Script to generate pretrained configs with embedded reference vectors
tests/unit/test_llm_router_schema.py Unit tests for schema validation and Pydantic models
tests/unit/conftest.py Test configuration to allow unit tests without Docker/Redis
tests/integration/test_llm_router.py Integration tests for sync LLMRouter functionality
tests/integration/test_async_llm_router.py Integration tests for async AsyncLLMRouter functionality
tests/integration/conftest.py Configuration for integration tests with optional Docker override
redisvl/extensions/llm_router/DESIGN.md Comprehensive design documentation
docs/user_guide/13_llm_router.ipynb User guide notebook with examples and usage patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…assmethods

The from_pretrained and from_existing methods (sync and async) ignored a
provided redis_client because redis_url defaults to "redis://localhost:6379"
and was always truthy. This caused ConnectionRefusedError in CI where Redis
runs on a dynamic testcontainer port.
- Validate threshold range (0, 2] in update_tier_threshold before
  assignment, matching the ModelTier Pydantic schema constraint.
- Guard _get_tier_matches against empty tiers list to prevent
  ValueError from max() on empty sequence.

Applied to both sync and async implementations.
Copilot AI review requested due to automatic review settings February 17, 2026 00:45
@bsbodden bsbodden self-assigned this Feb 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@vishal-bala vishal-bala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick glance through for now!

@bsbodden bsbodden requested review from rbs333 and removed request for abrookins and tylerhutcherson February 25, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants