# Fast LiteLLM

> Fast LiteLLM is a drop-in Rust acceleration layer for LiteLLM — 3.2× faster connection pooling, 1.6× rate limiting, 42× memory savings on high-cardinality workloads. Plus deep technical analysis of LiteLLM internals and consulting from Neul Labs.

## Project pages
- [Home](https://fast-litellm.neullabs.com/): Overview, benchmarks, install snippet.
- [Product](https://fast-litellm.neullabs.com/product): How Fast LiteLLM works — architecture and components.
- [Benchmarks](https://fast-litellm.neullabs.com/benchmarks): Honest performance numbers, including the cases where Python wins.
- [Why Fast LiteLLM](https://fast-litellm.neullabs.com/why-fast-litellm): The bottleneck profile and why Rust, specifically.
- [Knowledge](https://fast-litellm.neullabs.com/knowledge): Guides, articles, and LiteLLM issue analyses.
- [Consulting](https://fast-litellm.neullabs.com/consulting): LiteLLM consulting services.
- [About](https://fast-litellm.neullabs.com/about): About Fast LiteLLM and Neul Labs.

## Guides
- [Accelerating the LiteLLM proxy with Fast LiteLLM](https://fast-litellm.neullabs.com/knowledge/accelerating-litellm-proxy): A production-ready guide to running the LiteLLM proxy server with Fast LiteLLM under gunicorn, Docker, and systemd — including the import-order trap that catches most teams.
- [Installing Fast LiteLLM](https://fast-litellm.neullabs.com/knowledge/installing-fast-litellm): How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.
- [Rate limiting LiteLLM at high cardinality](https://fast-litellm.neullabs.com/knowledge/rate-limiting-at-scale): Why per-user rate limiting in pure Python eats memory at scale, and how Fast LiteLLM gets to 42× less RSS without changing your config.

## Articles
- [Why LiteLLM needs Rust (in three specific places, not everywhere)](https://fast-litellm.neullabs.com/knowledge/why-litellm-needs-rust): A measured argument for hybrid Python+Rust in LiteLLM's hot path — and the places where Python is still the right answer.
- [A deep dive into the Fast LiteLLM token counting benchmark](https://fast-litellm.neullabs.com/knowledge/token-counting-benchmark-deep-dive): Why tokenization with tiktoken-rs is 1.5–1.7× faster on long inputs and 0.5× as fast on short ones — the FFI overhead curve, fully explained.

## LiteLLM Issue Analyses
- [When PyPI maintainer accounts get hijacked: the LiteLLM 1.82.7/1.82.8 supply-chain compromise](https://fast-litellm.neullabs.com/knowledge/litellm-pypi-supply-chain-compromise): A timeline and technical analysis of the March 2026 LiteLLM PyPI compromise, what the malicious payload did, and the defenses every Python team should adopt today. (Upstream issue #24518, security.)
- [Why `import litellm` takes a second, and what it would take to fix it](https://fast-litellm.neullabs.com/knowledge/litellm-import-speed): A breakdown of LiteLLM's slow import path, the eager-registration anti-pattern that causes it, and the lazy-import refactor that would actually solve it. (Upstream issue #7605, performance.)
- [Bisecting the LiteLLM 1.80 → 1.81 performance regression](https://fast-litellm.neullabs.com/knowledge/litellm-1-81-perf-regression): A walkthrough of how to bisect a performance regression in a release like LiteLLM 1.81.x, the likely culprits in this specific case, and how to verify a fix. (Upstream issue #19921, performance.)
- [The aiohttp `Unclosed client session` warnings in LiteLLM, explained](https://fast-litellm.neullabs.com/knowledge/litellm-aiohttp-unclosed-sessions): Why LiteLLM's concurrent acompletion calls leak aiohttp sessions, what the warning actually means, why some warnings are false alarms, and how to fix the real ones. (Upstream issue #13251, concurrency.)
- [LiteLLM proxy on Python 3.14: an uvloop ABI break post-mortem](https://fast-litellm.neullabs.com/knowledge/litellm-python-314-uvloop): Why `litellm[proxy]` crashes on import with Python 3.14, the asyncio API removal that caused it, and what dependency-pinning patterns would have prevented it. (Upstream issue #20933, dependencies.)
- [The LiteLLM Router silently drops async callbacks — here's where](https://fast-litellm.neullabs.com/knowledge/litellm-router-async-callbacks): A trace through Router.acompletion explaining why CustomLogger async success/failure hooks aren't called, what the right fix looks like, and what to use until then. (Upstream issue #8842, correctness.)
- [Why LiteLLM burns extra GitHub Copilot premium requests on agent flows](https://fast-litellm.neullabs.com/knowledge/litellm-copilot-premium-requests): A deep dive into the X-Initiator header semantics, how Copilot's premium request accounting works, and why LiteLLM's transformation layer over-bills compared to Copilot CLI and OpenCode. (Upstream issue #18155, provider.)
- [Why your LiteLLM Prometheus metrics flicker under multiple workers](https://fast-litellm.neullabs.com/knowledge/litellm-prometheus-multiproc): The Prometheus multiprocess problem in a nutshell — what `prometheus_client` actually does across forked workers, why LiteLLM's metrics are unusable with --num_workers, and how to fix it cleanly. (Upstream issue #10595, ops.)

## External
- [GitHub repository](https://github.com/neul-labs/fast-litellm)
- [PyPI package](https://pypi.org/project/fast-litellm/)
- [Upstream LiteLLM (BerriAI)](https://github.com/BerriAI/litellm)
- [Neul Labs](https://www.neul.uk)