Benchmarks

Production-grade Python (thread-safe, lock-protected) compared head-to-head with Fast LiteLLM's Rust implementations. We publish the wins and the losses.

Methodology

Results

Component Speedup Memory Best for
Connection Pool 3.2× faster Same HTTP connection management
Rate Limiting 1.6× faster Same Throttling, quota management
Large Text Tokenization 1.5–1.7× faster Same Long documents
High-Cardinality Rate Limits 1.2× faster 42× less memory Many unique API keys/users
Concurrent Connection Pool 1.2× faster Same Multi-threaded workloads
Small Text Tokenization 0.5× (Python faster) Same Short messages — FFI overhead dominates
Routing 0.4× (Python faster) Same Model selection — FFI overhead dominates

Use Rust acceleration for

Python may be faster for

Why we publish the losses

"Rust is always faster" benchmarks are marketing, not engineering. FFI overhead is real. The interesting question for any acceleration layer is which specific workloads benefit and which don't. Showing only the wins would tell you nothing useful for capacity planning. The honest table above is what you'd get from running the benchmarks yourself.

Reproducing locally

git clone https://github.com/neul-labs/fast-litellm
cd fast-litellm
uv venv && source .venv/bin/activate
uv add --dev maturin
uv run maturin develop
python scripts/run_benchmarks.py --iterations 200