Installing Fast LiteLLM
How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.
Fast LiteLLM ships as a regular Python package with prebuilt wheels for every major platform. You should not need a Rust toolchain to install it.
Install
The recommended workflow uses uv:
uv add fast-litellm
pip works too:
pip install fast-litellm
Both pull a prebuilt wheel for your platform. The package depends on litellm, so you don’t need to install LiteLLM separately — but if you already have a pinned version, Fast LiteLLM is compatible with the latest stable LiteLLM release.
Use
Add one import line before you import litellm:
import fast_litellm # must come first
import litellm
response = litellm.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}],
)
That’s the entire integration. The fast_litellm import patches LiteLLM’s hot-path components in place. Anything you would normally do with LiteLLM still works exactly the same.
Verify acceleration is active
Fast LiteLLM exposes a small introspection API:
import fast_litellm
print(fast_litellm.is_active()) # True if Rust is loaded
print(fast_litellm.get_active_components()) # ['connection_pool', 'rate_limiter', ...]
If is_active() returns False, the package fell back to pure-Python mode at import time. The most common reasons:
- No prebuilt wheel for your platform. Check
pip show fast_litellmand confirm a.so/.pydextension is present in the install path. Linux musl (Alpine) and obscure architectures sometimes need a source build. - Python version mismatch. Wheels are built for Python 3.8–3.13. Anything outside that range will install the pure-Python fallback.
- A previous import disabled the Rust path. Fast LiteLLM has a circuit breaker that disables a component after 10 errors per process. If something else in your stack hit those errors first, the Rust path is off.
Supported platforms
| OS | Architectures |
|---|---|
| Linux | x86_64, aarch64 |
| macOS | x86_64, arm64 (Apple Silicon) |
| Windows | x86_64 |
Python 3.8 through 3.13. CI runs the full matrix on every commit.
Troubleshooting
ImportError: No module named _fast_litellm_native — The compiled extension isn’t bundled with your wheel. Most often this is a wheel-resolution issue with pip on an unusual platform. Force a clean install: pip install --force-reinstall --no-cache-dir fast-litellm.
Acceleration silently disabled in production — Look for warnings in your logs at startup:
fast_litellm: Rust component 'rate_limiter' disabled after 10 errors
That means the circuit breaker tripped. The next step is to capture the underlying exception — set FAST_LITELLM_LOG_LEVEL=debug and reproduce in a non-production environment.
Conflict with another LiteLLM monkeypatch — If something else in your stack also patches LiteLLM, import order matters. Fast LiteLLM should be imported first, before anything that touches litellm. Pay particular attention to test fixtures and Django/FastAPI startup hooks.
Building from source (rarely needed)
If you really need to build from source — say, for a target with no prebuilt wheel — you’ll need the Rust toolchain and maturin:
git clone https://github.com/neul-labs/fast-litellm
cd fast-litellm
uv venv && source .venv/bin/activate
uv add --dev maturin
uv run maturin develop --release
For everyday use, prefer the prebuilt wheels.
What’s next
- Accelerating the LiteLLM proxy — gunicorn, Docker, systemd.
- Rate limiting at scale — the high-cardinality memory story.
- Benchmarks — what you should actually expect to see.