Open source · MIT · From Neul Labs

Rust acceleration for LiteLLM.
Zero code changes.

Fast LiteLLM is a drop-in acceleration layer for the LiteLLM Python library — 3.2× faster connection pooling, 1.6× faster rate limiting, 42× less memory on high-cardinality workloads. Just import fast_litellm first.

Install
# Using uv (recommended)
uv add fast-litellm

# Or using pip
pip install fast-litellm
Use
import fast_litellm  # accelerates LiteLLM
import litellm

response = litellm.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hi"}],
)

Honest benchmarks

Production-grade Python (with thread safety) vs Rust acceleration.

Full methodology →
ComponentSpeedupMemoryBest for
Connection Pool 3.2× faster Same HTTP connection management
Rate Limiting 1.6× faster Same Request throttling, quota management
Large Text Tokenization 1.5–1.7× faster Same Long documents
High-Cardinality Rate Limits 1.2× faster 42× less memory Many unique API keys/users

⚠️ Rust is slower than Python on small-text tokenization and routing — FFI overhead dominates. We document this honestly on the benchmarks page.

Performance you can ship

Drop-in acceleration where it matters: connection pooling, rate limiting, token counting at scale. Production-safe with feature flags and automatic fallback.

Read about the architecture →

Deep LiteLLM expertise

Guides, articles, and technical analyses of long-standing LiteLLM issues — root causes, real fixes, broader lessons. Independent, no vendor agenda.

Browse the knowledge base →

Consulting from Neul Labs

Half-day audits, perf tuning, multi-provider routing design, incident response. For teams running LiteLLM in production.

See services →

From the knowledge base

All knowledge →

LiteLLM issue analyses

Guides

Articles

FAQ

What is fast-litellm? +

Fast LiteLLM is a drop-in Rust acceleration layer for the LiteLLM Python library. It replaces hot-path components — connection pooling, rate limiting, token counting — with Rust implementations compiled via PyO3, giving 1.5–3.2× speedups with zero code changes.

How do I install fast-litellm? +

Run `uv add fast-litellm` or `pip install fast-litellm`. Prebuilt wheels are available for Linux (x86_64, aarch64), macOS (x86_64, ARM64), and Windows (x86_64). Rust is not required to install.

How do I enable acceleration? +

Add `import fast_litellm` before `import litellm`. The package monkey-patches LiteLLM at import time, so no other code changes are required.

Is it safe for production? +

Yes. Fast LiteLLM ships with feature flags, automatic fallback to the Python implementation on any error, and a circuit breaker that disables a Rust component after 10 errors. Performance metrics are exposed for monitoring.

Which Python and LiteLLM versions are supported? +

Python 3.8 through 3.13 on Linux, macOS, and Windows. The latest stable LiteLLM release is supported. CI runs a compatibility matrix on every commit.

Running LiteLLM at scale?

Whether you need a perf audit, a tricky multi-provider routing design, or hands-on incident response — Neul Labs helps teams ship LiteLLM reliably.