LiteLLM Issue

Why `import litellm` takes a second, and what it would take to fix it

A breakdown of LiteLLM's slow import path, the eager-registration anti-pattern that causes it, and the lazy-import refactor that would actually solve it.

Dipankar Sarkar · · performanceimport-speedperformancelazy-loading
Upstream issue
#7605 — [Feature]: Improve import speed
Opened January 7, 2025 · status: open · 41 👍 · 30 comments
$ python -X importtime -c "import litellm" 2>&1 | tail -1
import time:    353694 |    1098960 | litellm

That’s roughly one second to import a Python library that hasn’t done any actual work yet. The microbenchmark is from issue #7605, which is over a year old, has 41 reactions, and is labelled performance. It hasn’t moved.

A second sounds trivial. It is not, for three classes of users:

  • Library authors whose package depends on LiteLLM. Their users now pay the LiteLLM tax even if they never call a model.
  • Serverless functions where every cold start pays the import cost on the request path. At a second per cold start, you’ve burned a noticeable fraction of your invocation budget before any work happens.
  • Test suites that re-import LiteLLM in fresh processes. Tens of tests times one second each adds up to a measurable slowdown in CI.

This post is about why the import is slow, what the structural cause is, and what a real fix would look like.

What’s actually happening in that second

The first step in any import-speed investigation is python -X importtime. The output of running it on import litellm shows hundreds of submodules being eagerly loaded. The biggest contributors:

ModuleApproximate cumulative time
litellm.utils~250 ms
litellm.llms (provider modules)~200 ms
litellm.integrations (callbacks)~150 ms
tokenizers and tiktoken loading~100 ms
Pydantic model definitions~80 ms
Various dataclasses, enums, and type stubsthe rest

The interesting thing is not any single one of these — it’s the pattern. LiteLLM imports nearly every module it might ever need, eagerly, at the top of litellm/__init__.py. This includes provider integrations for models you’ll never use, callback handlers for observability backends you’re not running, and tokenizer setup for tokenizers you’ll never call.

Why? Because LiteLLM is built around a global registry pattern: provider classes register themselves into module-level dicts on import, so that when a user calls litellm.completion(model="anthropic/claude-3-haiku"), the routing logic can look up “anthropic” in the registry. For that lookup to work, the provider has to have been imported. So everything gets imported.

This is the eager-registration anti-pattern, and it’s by no means unique to LiteLLM. Django’s app loading, SQLAlchemy’s mapper registration, and dozens of other Python projects use the same pattern. It’s convenient for the framework author and expensive for everyone downstream.

Why this is hard to fix incrementally

The straightforward fix — “just import providers lazily on first use” — runs into three real problems.

1. The registry is consulted during validation, not just dispatch. LiteLLM does a lot of work at import time to figure out what models exist, their context windows, their pricing, and their parameter compatibility. If a user types litellm.completion(model="claude-3.5-sonnet") and the registry doesn’t know about “claude-3.5-sonnet” yet, you can’t give them a good error message — you’d have to import every provider just to confirm the model is unknown. That’s the original eager-loading rationale.

2. The provider modules have side effects beyond registration. Several provider modules don’t just register themselves — they monkey-patch global state, configure logging, set environment variable defaults, and import their own SDKs (which themselves take hundreds of milliseconds to import). Lazy-loading the provider module also lazy-loads all of those side effects, which can cause subtle ordering bugs.

3. The public API exposes implementation details. Many users do from litellm import completion and from litellm.utils import token_counter. The litellm.utils module is currently the heaviest single contributor to import time. Splitting it requires either accepting a breaking API change or maintaining shim modules that re-export the new locations. Both have costs.

What a real fix looks like

The LiteLLM team has a hard problem here, but it’s tractable. The pattern that has worked for similar projects (notably transformers, which had the same problem and largely solved it) is:

1. Move provider registration to a static manifest. Instead of importing each provider module to discover its supported models, ship a JSON or Python data file that lists models and providers. The registry is built from the manifest at import time. The provider modules themselves are imported only when a model from that provider is actually called.

2. Lazy-load provider modules via __getattr__ on litellm.llms. Python supports module-level __getattr__ (since 3.7). This lets you defer the actual import of litellm.llms.anthropic until the first time someone accesses it, while still presenting the same API surface.

3. Split litellm.utils into focused modules. The current litellm.utils is a god-module that imports tokenizers, pricing logic, request transformations, and a dozen other things. Splitting it into litellm.tokenization, litellm.pricing, litellm.transformations, etc., lets users import only what they need.

4. Defer Pydantic model instantiation. Pydantic v2 model class creation is fast but still costs time at import. Some of LiteLLM’s Pydantic models are only used for response parsing and don’t need to exist until a response arrives. They can be moved into local imports inside response parsing functions.

A realistic target for these four changes together is a 5–10× reduction in import time — plausibly bringing import litellm from ~1 second to ~150 ms. The transformers library went from a similar place to a similar improvement using exactly this approach.

What users do today

In the absence of a fix, the workarounds people in #7605 and adjacent discussions are using:

  • Import inside the function that uses LiteLLM, not at the top of the module. This delays the import cost to first call, which is often what you want — at the cost of a slow first response.
  • Pre-warm in a background thread at process startup. Spin up a thread that imports LiteLLM while your app continues setting up. By the time the first request arrives, the import is done. Works for long-running servers, useless for serverless.
  • Use the LiteLLM proxy as a separate process instead of importing the SDK. The proxy pays the import cost once, at startup, and then serves requests over HTTP. For serverless callers this often nets out cheaper than importing locally.
  • Lazy-import inside if TYPE_CHECKING: blocks for type annotations only. This is a partial mitigation when you need LiteLLM types but not the runtime.

None of these are real fixes. They are coping strategies for a structural problem.

The broader lesson

Eager-registration is convenient when you write the framework and expensive when other people use it. Every Python project that lets components register themselves at import time inherits the same problem at the same scale. The fix is to separate discovery (what providers exist? what do they support?) from execution (load this specific provider’s code), and make the discovery step cheap. Static manifests, lazy module loading, and split utility modules are the ingredients. None of them are research; they all exist in production Python projects today.

LiteLLM is in a tougher spot than most because its registry is partly user-facing — error messages depend on knowing what’s registered. But the same lazy-discovery pattern that worked for transformers should work here. Issue #7605 has been open for over a year, has 41 reactions, and is one of the highest-leverage refactors in the repo. It’s worth doing.

References