<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Fast LiteLLM — Knowledge</title><description>Fast LiteLLM is a drop-in Rust acceleration layer for LiteLLM — 3.2× faster connection pooling, 1.6× rate limiting, 42× memory savings on high-cardinality workloads. Plus deep technical analysis of LiteLLM internals and consulting from Neul Labs.</description><link>https://fast-litellm.neullabs.com/</link><language>en-us</language><item><title>[Guide] Accelerating the LiteLLM proxy with Fast LiteLLM</title><link>https://fast-litellm.neullabs.com/knowledge/accelerating-litellm-proxy/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/accelerating-litellm-proxy/</guid><description>A production-ready guide to running the LiteLLM proxy server with Fast LiteLLM under gunicorn, Docker, and systemd — including the import-order trap that catches most teams.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>guide</category><category>tutorial</category><category>proxy</category><category>gunicorn</category><category>uvicorn</category><category>deployment</category></item><item><title>[Guide] Installing Fast LiteLLM</title><link>https://fast-litellm.neullabs.com/knowledge/installing-fast-litellm/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/installing-fast-litellm/</guid><description>How to install Fast LiteLLM, verify the Rust acceleration is active, and what to do when it isn&apos;t.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>guide</category><category>getting-started</category><category>installation</category><category>getting-started</category><category>pyo3</category></item><item><title>[LiteLLM Issue] Bisecting the LiteLLM 1.80 → 1.81 performance regression</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-1-81-perf-regression/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-1-81-perf-regression/</guid><description>A walkthrough of how to bisect a performance regression in a release like LiteLLM 1.81.x, the likely culprits in this specific case, and how to verify a fix.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>performance</category><category>performance</category><category>regression</category><category>bisect</category><category>postgres</category></item><item><title>[LiteLLM Issue] Why `import litellm` takes a second, and what it would take to fix it</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-import-speed/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-import-speed/</guid><description>A breakdown of LiteLLM&apos;s slow import path, the eager-registration anti-pattern that causes it, and the lazy-import refactor that would actually solve it.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>performance</category><category>import-speed</category><category>performance</category><category>lazy-loading</category></item><item><title>[LiteLLM Issue] When PyPI maintainer accounts get hijacked: the LiteLLM 1.82.7/1.82.8 supply-chain compromise</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-pypi-supply-chain-compromise/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-pypi-supply-chain-compromise/</guid><description>A timeline and technical analysis of the March 2026 LiteLLM PyPI compromise, what the malicious payload did, and the defenses every Python team should adopt today.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>security</category><category>supply-chain</category><category>security</category><category>pypi</category><category>sigstore</category></item><item><title>[Guide] Rate limiting LiteLLM at high cardinality</title><link>https://fast-litellm.neullabs.com/knowledge/rate-limiting-at-scale/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/rate-limiting-at-scale/</guid><description>Why per-user rate limiting in pure Python eats memory at scale, and how Fast LiteLLM gets to 42× less RSS without changing your config.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>guide</category><category>performance</category><category>rate-limiting</category><category>memory</category><category>dashmap</category><category>multi-tenant</category></item><item><title>[LiteLLM Issue] The aiohttp `Unclosed client session` warnings in LiteLLM, explained</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-aiohttp-unclosed-sessions/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-aiohttp-unclosed-sessions/</guid><description>Why LiteLLM&apos;s concurrent acompletion calls leak aiohttp sessions, what the warning actually means, why some warnings are false alarms, and how to fix the real ones.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>concurrency</category><category>aiohttp</category><category>concurrency</category><category>asyncio</category><category>lifecycle</category></item><item><title>[LiteLLM Issue] Why LiteLLM burns extra GitHub Copilot premium requests on agent flows</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-copilot-premium-requests/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-copilot-premium-requests/</guid><description>A deep dive into the X-Initiator header semantics, how Copilot&apos;s premium request accounting works, and why LiteLLM&apos;s transformation layer over-bills compared to Copilot CLI and OpenCode.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>provider</category><category>github-copilot</category><category>providers</category><category>billing</category><category>agents</category></item><item><title>[LiteLLM Issue] Why your LiteLLM Prometheus metrics flicker under multiple workers</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-prometheus-multiproc/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-prometheus-multiproc/</guid><description>The Prometheus multiprocess problem in a nutshell — what `prometheus_client` actually does across forked workers, why LiteLLM&apos;s metrics are unusable with --num_workers, and how to fix it cleanly.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>ops</category><category>prometheus</category><category>observability</category><category>gunicorn</category><category>uvicorn</category></item><item><title>[LiteLLM Issue] LiteLLM proxy on Python 3.14: an uvloop ABI break post-mortem</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-python-314-uvloop/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-python-314-uvloop/</guid><description>Why `litellm[proxy]` crashes on import with Python 3.14, the asyncio API removal that caused it, and what dependency-pinning patterns would have prevented it.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>dependencies</category><category>python-3-14</category><category>uvloop</category><category>dependencies</category><category>asyncio</category></item><item><title>[LiteLLM Issue] The LiteLLM Router silently drops async callbacks — here&apos;s where</title><link>https://fast-litellm.neullabs.com/knowledge/litellm-router-async-callbacks/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/litellm-router-async-callbacks/</guid><description>A trace through Router.acompletion explaining why CustomLogger async success/failure hooks aren&apos;t called, what the right fix looks like, and what to use until then.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>issue</category><category>correctness</category><category>router</category><category>callbacks</category><category>async</category><category>observability</category></item><item><title>[Article] Why LiteLLM needs Rust (in three specific places, not everywhere)</title><link>https://fast-litellm.neullabs.com/knowledge/why-litellm-needs-rust/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/why-litellm-needs-rust/</guid><description>A measured argument for hybrid Python+Rust in LiteLLM&apos;s hot path — and the places where Python is still the right answer.</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate><category>article</category><category>opinion</category><category>rust</category><category>python</category><category>performance</category><category>ffi</category><category>gil</category></item><item><title>[Article] A deep dive into the Fast LiteLLM token counting benchmark</title><link>https://fast-litellm.neullabs.com/knowledge/token-counting-benchmark-deep-dive/</link><guid isPermaLink="true">https://fast-litellm.neullabs.com/knowledge/token-counting-benchmark-deep-dive/</guid><description>Why tokenization with tiktoken-rs is 1.5–1.7× faster on long inputs and 0.5× as fast on short ones — the FFI overhead curve, fully explained.</description><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><category>article</category><category>benchmark</category><category>tiktoken</category><category>benchmarks</category><category>ffi</category><category>tokenization</category></item></channel></rss>