LiteLLM Issue

The aiohttp `Unclosed client session` warnings in LiteLLM, explained

Why LiteLLM's concurrent acompletion calls leak aiohttp sessions, what the warning actually means, why some warnings are false alarms, and how to fix the real ones.

Dipankar Sarkar · · concurrencyaiohttpconcurrencyasynciolifecycle
Upstream issue
#13251 — [Bug]: Unclosed aiohttp client session when using acompletion with concurrent requests
Opened August 4, 2025 · status: stale · 14 👍 · 12 comments

If you have ever run LiteLLM under concurrent load with acompletion, you have probably seen this in your logs:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x128a4cbd0>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x139580360>, ...)])']
connector: <aiohttp.connector.TCPConnector object at 0x128e69150>

There are two open issues about this in the LiteLLM repo:

  • #13251 — concurrent acompletion with Vertex AI Gemini, marked stale.
  • #11657 — Ollama embedding calls.

They look similar, but they have different root causes and different fixes. This post separates them, explains the aiohttp lifecycle that’s actually being violated, and walks through what’s a real leak versus what’s a noisy false alarm.

What the warning actually means

aiohttp.ClientSession.__del__ checks whether close() was called on the session before the garbage collector got to it. If not, it logs a warning. The warning is emitted from __del__, which means it fires during garbage collection — usually at process exit, but potentially mid-run if a session goes out of scope and the GC happens to run.

Two important things follow from this:

  1. The warning is a hint, not always a leak. A session that wasn’t explicitly closed but had no active connections at GC time isn’t actually leaking anything — it’s just impolite. The warning is real but the practical impact is zero.
  2. The warning is also sometimes a real leak. A session with active connections in its connector at GC time is leaking those connections. The remote server’s TCP stack will eventually time them out, but in the meantime they’re consuming file descriptors and provider-side connection slots.

The two issues conflate these. Let’s separate them.

Issue #13251: Vertex AI Gemini, concurrent acompletion

The reporter spawns 1,000 concurrent requests through asyncio.Semaphore(20), all complete successfully, and then sees warnings about unclosed sessions with 18 connections still in the connector deque. This is the real leak flavor.

The root cause: LiteLLM’s Vertex AI provider creates a fresh aiohttp.ClientSession per request rather than reusing a single application-wide session. Each session is local to one call. When the call completes, the response is returned and the session goes out of scope — but close() is never called. The session sits in memory until GC, with idle connections still in its TCPConnector deque.

The reason this happens specifically under concurrent load is GC timing. Under sequential load, sessions go out of scope one at a time, and CPython’s reference-counting collector reaps them quickly enough that you don’t notice. Under concurrent load, many sessions exist simultaneously, the cyclic collector kicks in periodically, and you get a burst of warnings all at once when it runs.

The real fix is to use a shared session per provider, not a per-call session. This is also the recommended pattern in the aiohttp client docs: “Performance-critical code should reuse a single ClientSession across the lifetime of the application.” LiteLLM does this for some providers (notably the OpenAI client path) and doesn’t for others (Vertex AI being one). The asymmetry is the bug.

A correct shared-session pattern looks like:

class VertexProvider:
    def __init__(self):
        self._session = None

    async def _get_session(self):
        if self._session is None or self._session.closed:
            self._session = aiohttp.ClientSession(
                connector=aiohttp.TCPConnector(limit=100, limit_per_host=30),
                timeout=aiohttp.ClientTimeout(total=600),
            )
        return self._session

    async def close(self):
        if self._session and not self._session.closed:
            await self._session.close()

The session is created on first use and reused for subsequent calls. The connector pools connections across calls, which is faster and doesn’t leak. Cleanup happens once, at provider shutdown.

Issue #11657: Ollama embeddings

The reporter calls embedding() (synchronous) for each prompt against ollama/nomic-embed-text and gets one warning per call. Setting litellm.disable_aiohttp_transport = True doesn’t suppress the warnings.

This one is more interesting because it’s the noisy false alarm flavor mixed with a real underlying issue.

The Ollama provider in LiteLLM uses aiohttp internally even for the synchronous embedding() entry point — under the hood, the sync call wraps an async call via asyncio.run(). Each asyncio.run() creates a new event loop, runs the call, then tears down the loop. During teardown, any aiohttp sessions that were created during that loop’s lifetime are GC’d, and their __del__ runs the warning.

The session in this case has typically already returned its connections to the connector by the time GC runs — there are no active in-flight connections. So strictly speaking it’s not leaking anything that matters. But aiohttp doesn’t know that; the session wasn’t close()d, so it warns.

The fix here is the same shape as #13251 (use a shared session) but the additional wrinkle is that embedding() is a sync API, so there’s no obvious lifecycle to bind the session to. The most pragmatic fix is to wrap the entire sync entry point in a single asyncio.run() that explicitly creates and closes the session inside the loop, rather than relying on GC. Something like:

def embedding(model: str, input: list[str]):
    async def _run():
        async with aiohttp.ClientSession() as session:
            return await _async_embedding(session, model, input)
    return asyncio.run(_run())

The async with ensures close() is called before the loop exits, so no warning ever fires.

The reason disable_aiohttp_transport = True doesn’t help is that the flag controls whether LiteLLM uses aiohttp for the main completion path. The Ollama embedding path has its own internal session management and isn’t covered by the flag — that’s an inconsistency in LiteLLM that should probably be fixed by routing all aiohttp usage through the same toggle.

Why both issues are stale

Both issues have been open for many months. #13251 is explicitly marked stale by the bot. The reason isn’t that they’re unimportant — it’s that they’re subtle. The fix isn’t a one-line patch; it requires auditing every provider integration that uses aiohttp, identifying which ones create per-call sessions, and refactoring them to share. That’s a multi-PR effort spanning provider modules maintained by different contributors.

Stale doesn’t mean “wrong” or “not a bug.” It means “nobody picked it up.” For users hitting these warnings, the issue being stale is itself a signal: don’t wait for the upstream fix. Apply a workaround or patch your local copy.

What users actually do

From the comments on both issues:

  1. Suppress the warning at the logging level. Add a filter for aiohttp.client and aiohttp.connector log messages. This is the dirtiest workaround and the most popular one. It works because in most cases the underlying connections aren’t actually leaking — they’re just being closed via GC instead of explicitly. Suppressing the warning doesn’t fix anything but it doesn’t break anything either.
  2. Force GC at strategic points with gc.collect(). This is wishful thinking. Forcing GC just makes the warnings appear earlier and more predictably, not less often.
  3. Wrap the LiteLLM call in your own session-managed context. Some users have moved to using the underlying provider SDK directly (Vertex AI’s aiplatform, Ollama’s REST API via httpx) rather than going through LiteLLM, specifically to control session lifecycle themselves. This works but defeats the purpose of using LiteLLM.
  4. Use the LiteLLM proxy in a separate process so the warnings show up there instead of in your application. This doesn’t fix anything but it isolates the problem to a process you don’t have to read logs for.

The broader lesson

aiohttp session lifecycle is a recurring source of bugs in async Python. The library is correct to warn — orphaned sessions are a real problem at scale — but the warnings are loud and the fixes require thinking carefully about object ownership in an async context.

The pattern that consistently works: own the session at the same scope as the work it does. A long-running provider class owns one session for its lifetime. A short-lived async context manager owns its own session. Sessions should never be created inside a function and returned implicitly via “the response object holds a reference to it” — that’s the per-call-session pattern, and that’s what causes the warnings.

For any Python project that wraps aiohttp, the audit checklist is:

  • Find every aiohttp.ClientSession() call.
  • For each one, identify the scope that owns the session.
  • If the scope is “the duration of one HTTP call,” that’s a bug. Move the session up a level.
  • If the scope is “the application lifetime,” verify that close() is called at shutdown.
  • Run a load test and check the logs for warnings. If any appear, the audit isn’t complete.

LiteLLM is large enough that this audit hasn’t been done end-to-end yet. Both #13251 and #11657 are symptoms of the same missing audit.

References