Guide

Installing Fast LiteLLM

How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.

Dipankar Sarkar · April 12, 2026 · getting-startedinstallationgetting-startedpyo3

Fast LiteLLM ships as a regular Python package with prebuilt wheels for every major platform. You should not need a Rust toolchain to install it.

Install

The recommended workflow uses uv:

uv add fast-litellm

pip works too:

pip install fast-litellm

Both pull a prebuilt wheel for your platform. The package depends on litellm, so you don’t need to install LiteLLM separately — but if you already have a pinned version, Fast LiteLLM is compatible with the latest stable LiteLLM release.

Use

Add one import line before you import litellm:

import fast_litellm  # must come first
import litellm

response = litellm.completion(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello"}],
)

That’s the entire integration. The fast_litellm import patches LiteLLM’s hot-path components in place. Anything you would normally do with LiteLLM still works exactly the same.

Verify acceleration is active

Fast LiteLLM exposes a small introspection API:

import fast_litellm
print(fast_litellm.is_active())             # True if Rust is loaded
print(fast_litellm.get_active_components()) # ['connection_pool', 'rate_limiter', ...]

If is_active() returns False, the package fell back to pure-Python mode at import time. The most common reasons:

No prebuilt wheel for your platform. Check pip show fast_litellm and confirm a .so/.pyd extension is present in the install path. Linux musl (Alpine) and obscure architectures sometimes need a source build.
Python version mismatch. Wheels are built for Python 3.8–3.13. Anything outside that range will install the pure-Python fallback.
A previous import disabled the Rust path. Fast LiteLLM has a circuit breaker that disables a component after 10 errors per process. If something else in your stack hit those errors first, the Rust path is off.

Supported platforms

OS	Architectures
Linux	x86_64, aarch64
macOS	x86_64, arm64 (Apple Silicon)
Windows	x86_64

Python 3.8 through 3.13. CI runs the full matrix on every commit.

Troubleshooting

ImportError: No module named _fast_litellm_native — The compiled extension isn’t bundled with your wheel. Most often this is a wheel-resolution issue with pip on an unusual platform. Force a clean install: pip install --force-reinstall --no-cache-dir fast-litellm.

Acceleration silently disabled in production — Look for warnings in your logs at startup:

fast_litellm: Rust component 'rate_limiter' disabled after 10 errors

That means the circuit breaker tripped. The next step is to capture the underlying exception — set FAST_LITELLM_LOG_LEVEL=debug and reproduce in a non-production environment.

Conflict with another LiteLLM monkeypatch — If something else in your stack also patches LiteLLM, import order matters. Fast LiteLLM should be imported first, before anything that touches litellm. Pay particular attention to test fixtures and Django/FastAPI startup hooks.

Building from source (rarely needed)

If you really need to build from source — say, for a target with no prebuilt wheel — you’ll need the Rust toolchain and maturin:

git clone https://github.com/neul-labs/fast-litellm
cd fast-litellm
uv venv && source .venv/bin/activate
uv add --dev maturin
uv run maturin develop --release

For everyday use, prefer the prebuilt wheels.

What’s next

Accelerating the LiteLLM proxy — gunicorn, Docker, systemd.
Rate limiting at scale — the high-cardinality memory story.
Benchmarks — what you should actually expect to see.