Knowledge
Long-form technical writing on LiteLLM, Rust, Python performance, and production LLM operations. Three flavors: guides (hands-on walkthroughs), articles (opinions and deep dives), and issue analyses (independent technical writeups of long-standing open issues in BerriAI/LiteLLM).
Issue analyses are commentary on a project we don't maintain. Neul Labs is not affiliated with BerriAI.
Guides
Hands-on, opinionated walkthroughs you can act on.
- Guide tutorial · Apr 12, 2026
Accelerating the LiteLLM proxy with Fast LiteLLM
A production-ready guide to running the LiteLLM proxy server with Fast LiteLLM under gunicorn, Docker, and systemd — including the import-order trap that catches most teams.
- Guide getting-started · Apr 12, 2026
Installing Fast LiteLLM
How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn't.
- Guide performance · Apr 12, 2026
Rate limiting LiteLLM at high cardinality
Why per-user rate limiting in pure Python eats memory at scale, and how Fast LiteLLM gets to 42× less RSS without changing your config.
Articles
Long-form thinking and arguments.
- Article opinion · Apr 10, 2026
Why LiteLLM needs Rust (in three specific places, not everywhere)
A measured argument for hybrid Python+Rust in LiteLLM's hot path — and the places where Python is still the right answer.
- Article benchmark · Apr 9, 2026
A deep dive into the Fast LiteLLM token counting benchmark
Why tokenization with tiktoken-rs is 1.5–1.7× faster on long inputs and 0.5× as fast on short ones — the FFI overhead curve, fully explained.
LiteLLM Issue Analyses
Deep technical analysis of long-standing upstream LiteLLM issues. Sorted by upstream reactions.
- LiteLLM Issue security · #24518 · 164 👍 · 116 comments
When PyPI maintainer accounts get hijacked: the LiteLLM 1.82.7/1.82.8 supply-chain compromise
A timeline and technical analysis of the March 2026 LiteLLM PyPI compromise, what the malicious payload did, and the defenses every Python team should adopt today.
- LiteLLM Issue performance · #7605 · 41 👍 · 30 comments
Why `import litellm` takes a second, and what it would take to fix it
A breakdown of LiteLLM's slow import path, the eager-registration anti-pattern that causes it, and the lazy-import refactor that would actually solve it.
- LiteLLM Issue performance · #19921 · 15 👍 · 41 comments
Bisecting the LiteLLM 1.80 → 1.81 performance regression
A walkthrough of how to bisect a performance regression in a release like LiteLLM 1.81.x, the likely culprits in this specific case, and how to verify a fix.
- LiteLLM Issue concurrency · #13251 · 14 👍 · 12 comments
The aiohttp `Unclosed client session` warnings in LiteLLM, explained
Why LiteLLM's concurrent acompletion calls leak aiohttp sessions, what the warning actually means, why some warnings are false alarms, and how to fix the real ones.
- LiteLLM Issue dependencies · #20933 · 13 👍 · 3 comments
LiteLLM proxy on Python 3.14: an uvloop ABI break post-mortem
Why `litellm[proxy]` crashes on import with Python 3.14, the asyncio API removal that caused it, and what dependency-pinning patterns would have prevented it.
- LiteLLM Issue correctness · #8842 · 13 👍 · 20 comments
The LiteLLM Router silently drops async callbacks — here's where
A trace through Router.acompletion explaining why CustomLogger async success/failure hooks aren't called, what the right fix looks like, and what to use until then.
- LiteLLM Issue provider · #18155 · 9 👍 · 4 comments
Why LiteLLM burns extra GitHub Copilot premium requests on agent flows
A deep dive into the X-Initiator header semantics, how Copilot's premium request accounting works, and why LiteLLM's transformation layer over-bills compared to Copilot CLI and OpenCode.
- LiteLLM Issue ops · #10595 · 0 👍 · 4 comments
Why your LiteLLM Prometheus metrics flicker under multiple workers
The Prometheus multiprocess problem in a nutshell — what `prometheus_client` actually does across forked workers, why LiteLLM's metrics are unusable with --num_workers, and how to fix it cleanly.