Benchmarks

Production-grade Python (thread-safe, lock-protected) compared head-to-head with Fast LiteLLM's Rust implementations. We publish the wins and the losses.

Methodology

Each benchmark runs 200 iterations after a warm-up phase.
Python baselines use the production code path, including thread-safety primitives — not stripped-down toy versions.
Memory is measured as steady-state RSS after running through 1,000+ unique keys (the high-cardinality test).
Reproducible: python scripts/run_benchmarks.py --iterations 200 in the source repo.

Results

Component	Speedup	Memory	Best for
Connection Pool	3.2× faster	Same	HTTP connection management
Rate Limiting	1.6× faster	Same	Throttling, quota management
Large Text Tokenization	1.5–1.7× faster	Same	Long documents
High-Cardinality Rate Limits	1.2× faster	42× less memory	Many unique API keys/users
Concurrent Connection Pool	1.2× faster	Same	Multi-threaded workloads
Small Text Tokenization	0.5× (Python faster)	Same	Short messages — FFI overhead dominates
Routing	0.4× (Python faster)	Same	Model selection — FFI overhead dominates

Use Rust acceleration for

Connection pooling — 3×+ speedup, the single biggest win.
Rate limiting — 1.5×+ speedup.
Large text token counting — 1.5×+ speedup.
High-cardinality workloads (1000+ unique keys) — 40×+ memory savings.

Python may be faster for

Small-text token counting. The cost of crossing the Python ↔ Rust boundary dominates the actual tokenization work for short messages. Fast LiteLLM detects this and prefers Python automatically.
Routing with complex Python objects. Marshalling rich Python objects across FFI is expensive. Routing acceleration is off by default; enable it only after benchmarking your specific workload.

Why we publish the losses

"Rust is always faster" benchmarks are marketing, not engineering. FFI overhead is real. The interesting question for any acceleration layer is which specific workloads benefit and which don't. Showing only the wins would tell you nothing useful for capacity planning. The honest table above is what you'd get from running the benchmarks yourself.

Reproducing locally

git clone https://github.com/neul-labs/fast-litellm
cd fast-litellm
uv venv && source .venv/bin/activate
uv add --dev maturin
uv run maturin develop
python scripts/run_benchmarks.py --iterations 200