Open source · MIT · From Neul Labs

Rust acceleration for LiteLLM.
Zero code changes.

Fast LiteLLM is a drop-in acceleration layer for the LiteLLM Python library — 3.2× faster connection pooling, 1.6× faster rate limiting, 42× less memory on high-cardinality workloads. Just import fast_litellm first.

How it works See benchmarks GitHub

Install

# Using uv (recommended)
uv add fast-litellm

# Or using pip
pip install fast-litellm

Use

import fast_litellm  # accelerates LiteLLM
import litellm

response = litellm.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hi"}],
)

Honest benchmarks

Production-grade Python (with thread safety) vs Rust acceleration.

Full methodology →

Component	Speedup	Memory	Best for
Connection Pool	3.2× faster	Same	HTTP connection management
Rate Limiting	1.6× faster	Same	Request throttling, quota management
Large Text Tokenization	1.5–1.7× faster	Same	Long documents
High-Cardinality Rate Limits	1.2× faster	42× less memory	Many unique API keys/users

⚠️ Rust is slower than Python on small-text tokenization and routing — FFI overhead dominates. We document this honestly on the benchmarks page.

↯

Performance you can ship

Drop-in acceleration where it matters: connection pooling, rate limiting, token counting at scale. Production-safe with feature flags and automatic fallback.

Read about the architecture →

⊙

Deep LiteLLM expertise

Guides, articles, and technical analyses of long-standing LiteLLM issues — root causes, real fixes, broader lessons. Independent, no vendor agenda.

Browse the knowledge base →

✦

Consulting from Neul Labs

Half-day audits, perf tuning, multi-provider routing design, incident response. For teams running LiteLLM in production.

See services →

From the knowledge base

All knowledge →

LiteLLM issue analyses

When PyPI maintainer accounts get hijacked: the LiteLLM 1.82.7/1.82.8 supply-chain compromise
#24518 · security
Why `import litellm` takes a second, and what it would take to fix it
#7605 · performance
Bisecting the LiteLLM 1.80 → 1.81 performance regression
#19921 · performance
The aiohttp `Unclosed client session` warnings in LiteLLM, explained
#13251 · concurrency

Guides

Accelerating the LiteLLM proxy with Fast LiteLLM
A production-ready guide to running the LiteLLM proxy server with Fast LiteLLM under gunicorn, Docker, and systemd — including the import-order trap that catches most teams.
Installing Fast LiteLLM
How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.
Rate limiting LiteLLM at high cardinality
Why per-user rate limiting in pure Python eats memory at scale, and how Fast LiteLLM gets to 42× less RSS without changing your config.

Articles

Why LiteLLM needs Rust (in three specific places, not everywhere)
A measured argument for hybrid Python+Rust in LiteLLM's hot path — and the places where Python is still the right answer.
A deep dive into the Fast LiteLLM token counting benchmark
Why tokenization with tiktoken-rs is 1.5–1.7× faster on long inputs and 0.5× as fast on short ones — the FFI overhead curve, fully explained.

FAQ

Explore Fast LiteLLM

Everything on the site, one click from here.

Running LiteLLM at scale?

Whether you need a perf audit, a tricky multi-provider routing design, or hands-on incident response — Neul Labs helps teams ship LiteLLM reliably.

Book a 30-min call See consulting services

Rust acceleration for LiteLLM.
Zero code changes.

Honest benchmarks

Performance you can ship

Deep LiteLLM expertise

Consulting from Neul Labs

From the knowledge base

LiteLLM issue analyses

Guides

Articles

FAQ

Explore Fast LiteLLM

Product

Features

Quickstart

Benchmarks

Why Fast LiteLLM

Compare

Use cases

Knowledge base

Glossary

FAQ

Consulting

About

Running LiteLLM at scale?

Rust acceleration for LiteLLM. Zero code changes.

Honest benchmarks

Performance you can ship

Deep LiteLLM expertise

Consulting from Neul Labs

From the knowledge base

LiteLLM issue analyses

Guides

Articles

FAQ

Explore Fast LiteLLM

Product

Features

Quickstart

Benchmarks

Why Fast LiteLLM

Compare

Use cases

Knowledge base

Glossary

FAQ

Consulting

About

Running LiteLLM at scale?

Rust acceleration for LiteLLM.
Zero code changes.