Rust acceleration for LiteLLM.
Zero code changes.
Fast LiteLLM is a drop-in acceleration layer for the LiteLLM Python library —
3.2× faster connection pooling,
1.6× faster rate limiting,
42× less memory on high-cardinality workloads.
Just import fast_litellm first.
# Using uv (recommended)
uv add fast-litellm
# Or using pip
pip install fast-litellm import fast_litellm # accelerates LiteLLM
import litellm
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hi"}],
) Honest benchmarks
Production-grade Python (with thread safety) vs Rust acceleration.
| Component | Speedup | Memory | Best for |
|---|---|---|---|
| Connection Pool | 3.2× faster | Same | HTTP connection management |
| Rate Limiting | 1.6× faster | Same | Request throttling, quota management |
| Large Text Tokenization | 1.5–1.7× faster | Same | Long documents |
| High-Cardinality Rate Limits | 1.2× faster | 42× less memory | Many unique API keys/users |
⚠️ Rust is slower than Python on small-text tokenization and routing — FFI overhead dominates. We document this honestly on the benchmarks page.
Performance you can ship
Drop-in acceleration where it matters: connection pooling, rate limiting, token counting at scale. Production-safe with feature flags and automatic fallback.
Read about the architecture →Deep LiteLLM expertise
Guides, articles, and technical analyses of long-standing LiteLLM issues — root causes, real fixes, broader lessons. Independent, no vendor agenda.
Browse the knowledge base →Consulting from Neul Labs
Half-day audits, perf tuning, multi-provider routing design, incident response. For teams running LiteLLM in production.
See services →From the knowledge base
All knowledge →LiteLLM issue analyses
- When PyPI maintainer accounts get hijacked: the LiteLLM 1.82.7/1.82.8 supply-chain compromise
#24518 · security
- Why `import litellm` takes a second, and what it would take to fix it
#7605 · performance
- Bisecting the LiteLLM 1.80 → 1.81 performance regression
#19921 · performance
- The aiohttp `Unclosed client session` warnings in LiteLLM, explained
#13251 · concurrency
Guides
- Accelerating the LiteLLM proxy with Fast LiteLLM
A production-ready guide to running the LiteLLM proxy server with Fast LiteLLM under gunicorn, Docker, and systemd — including the import-order trap that catches most teams.
- Installing Fast LiteLLM
How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.
- Rate limiting LiteLLM at high cardinality
Why per-user rate limiting in pure Python eats memory at scale, and how Fast LiteLLM gets to 42× less RSS without changing your config.
Articles
- Why LiteLLM needs Rust (in three specific places, not everywhere)
A measured argument for hybrid Python+Rust in LiteLLM's hot path — and the places where Python is still the right answer.
- A deep dive into the Fast LiteLLM token counting benchmark
Why tokenization with tiktoken-rs is 1.5–1.7× faster on long inputs and 0.5× as fast on short ones — the FFI overhead curve, fully explained.
FAQ
What is fast-litellm? +
Fast LiteLLM is a drop-in Rust acceleration layer for the LiteLLM Python library. It replaces hot-path components — connection pooling, rate limiting, token counting — with Rust implementations compiled via PyO3, giving 1.5–3.2× speedups with zero code changes.
How do I install fast-litellm? +
Run `uv add fast-litellm` or `pip install fast-litellm`. Prebuilt wheels are available for Linux (x86_64, aarch64), macOS (x86_64, ARM64), and Windows (x86_64). Rust is not required to install.
How do I enable acceleration? +
Add `import fast_litellm` before `import litellm`. The package monkey-patches LiteLLM at import time, so no other code changes are required.
Is it safe for production? +
Yes. Fast LiteLLM ships with feature flags, automatic fallback to the Python implementation on any error, and a circuit breaker that disables a Rust component after 10 errors. Performance metrics are exposed for monitoring.
Which Python and LiteLLM versions are supported? +
Python 3.8 through 3.13 on Linux, macOS, and Windows. The latest stable LiteLLM release is supported. CI runs a compatibility matrix on every commit.
Running LiteLLM at scale?
Whether you need a perf audit, a tricky multi-provider routing design, or hands-on incident response — Neul Labs helps teams ship LiteLLM reliably.