The LiteLLM Router silently drops async callbacks — here's where
A trace through Router.acompletion explaining why CustomLogger async success/failure hooks aren't called, what the right fix looks like, and what to use until then.
Issue #8842 is a small, surgical bug, the kind that’s hard to notice until something downstream breaks because of it. The reporter set up a CustomLogger with both sync and async hooks, then ran:
sync_send_request() # prints On Success
asyncio.run(async_send_request()) # does NOT print On Async Success
The synchronous router call dispatches the log_success_event hook correctly. The async call dispatches log_pre_api_call and log_post_api_call, but never calls async_log_success_event — even though the request itself succeeds and the response object comes back fine.
This is a correctness bug, not a performance one. If you depend on async hooks for production observability — billing events, audit logs, OTEL spans, anything — you have a silent gap in your data wherever the router’s async path is involved. And nothing in your logs tells you the data is missing, because the call itself succeeded.
This post traces the dispatch path, identifies where the await drops, and explains what a real fix looks like.
Reproducing it (you should)
The MRE in the issue is small enough to run locally in five minutes. Set up a CustomLogger subclass, add an async hook, register it via litellm.callbacks.append(handler), and call Router.acompletion. The async hook will not fire. The sync log_success_event might fire, depending on which version you’re on; the async one consistently doesn’t.
If you want to confirm the bug is still present in your version, the reproducer is:
import asyncio, litellm
from litellm.integrations.custom_logger import CustomLogger
class H(CustomLogger):
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
print("ASYNC HOOK FIRED")
litellm.callbacks.append(H())
router = litellm.Router(model_list=[{
"model_name": "default",
"litellm_params": {"model": "gpt-3.5-turbo"},
}])
async def go():
await router.acompletion(model="default", messages=[{"role": "user", "content": "hi"}])
asyncio.run(go())
# Expected: ASYNC HOOK FIRED
# Actual: (nothing)
Where the await actually drops
The dispatch chain for Router.acompletion is roughly:
Router.acompletion(...)
→ Router._acompletion(...) # internal wrapper with retry logic
→ Router._async_function_with_retries(...)
→ litellm.acompletion(...) # the underlying SDK call
→ completion handler for the provider
→ returns response object
→ callback dispatch path here
The interesting question is: who is responsible for firing async_log_success_event?
In the sync path (Router.completion → litellm.completion), the answer is the SDK. After litellm.completion produces a response, the SDK iterates litellm.callbacks, sees the registered handlers, and calls the appropriate hook for each one. For sync calls this is straightforward — every hook is called inline.
In the async path, two things change. First, async_log_success_event is an async def method, so dispatching it requires being inside an event loop (which we are) and awaiting the coroutine. Second, the dispatch happens inside the SDK’s litellm.acompletion, not inside the router. So the question becomes: does the SDK’s acompletion await the callback, or fire it as a background task and forget about it?
The answer, based on the dispatch code in litellm/utils.py, is the second one: the SDK schedules async callbacks via asyncio.ensure_future() (or similar) without awaiting the resulting task. The callback runs eventually, but only as long as the event loop keeps running. The reporter’s MRE uses asyncio.run(go()), which terminates the loop the moment go() returns. Any callback task that hasn’t completed by that moment is silently cancelled.
The result: async_log_success_event was scheduled but never got CPU time before the loop shut down, so the user sees no output. In a long-running server (like a proxy with a persistent loop), the same callback usually does fire — eventually — but with no ordering guarantee relative to subsequent requests. That’s a different correctness problem hiding behind the same bug.
The real fix
There are two correct ways to dispatch async callbacks, and they have different semantics.
Option A: await each callback inline. The dispatch code in the response path becomes:
for cb in litellm.callbacks:
if hasattr(cb, "async_log_success_event"):
await cb.async_log_success_event(kwargs, response, start, end)
This guarantees the callback completes before the request returns. It’s the right choice for callbacks that need to happen before the user sees a response (audit logs, billing events). The downside is that the user-perceived latency now includes the callback time. A slow Langfuse or OTEL endpoint slows every request.
Option B: track scheduled tasks at the loop level. Schedule the callback with asyncio.ensure_future, but also stash the resulting task in a set the loop drains at shutdown. This is what asyncio.TaskGroup (3.11+) is designed for. Or, for older Pythons, the manual pattern of pending = set(); pending.add(task); task.add_done_callback(pending.discard). This decouples the callback from the request latency but ensures it eventually runs.
The ideal fix is both, configurable per callback. Some callbacks (audit) are blocking; some (analytics) are fire-and-forget. The current LiteLLM dispatch path doesn’t distinguish, and the bug is that the fire-and-forget path doesn’t keep references to its own tasks, leading to silent cancellation.
A patch for just the silent-cancellation half of this — keeping references to scheduled tasks — is small and would fix the reporter’s symptom without changing semantics for users who like the current behavior.
What users do in the meantime
Workarounds from the issue and adjacent discussions:
- Use sync callbacks instead of async ones, where possible. The sync
log_success_eventdoes fire correctly. This is the most common workaround. The downside is that synchronous callbacks block the event loop, so a slow callback will tank async throughput. - Register callbacks at the proxy level instead of the SDK level. The LiteLLM proxy server has its own callback dispatch path and is more reliable for async hooks because the server keeps the loop running indefinitely. If you control the proxy, dispatch from there.
- Wrap the LiteLLM call in your own observability layer. Catch the response, await your own logging coroutine inline, return the response. This is what most teams end up doing in practice — give up on
CustomLoggerfor the async path and handle it at the application level. - For OTEL specifically, use the OTEL Python SDK directly with
start_as_current_spanaround the LiteLLM call. The SDK manages span context propagation correctly and doesn’t depend on LiteLLM’s callback dispatch at all.
The broader lesson
Async callback dispatch is one of the patterns where Python’s coroutine model fights against you. A coroutine that isn’t awaited is silently dropped — there’s no exception, no warning, just nothing. If you want fire-and-forget, you have to keep references to the tasks until they complete; if you don’t, the GC will collect them mid-flight when the strong reference disappears. CPython 3.11 added a RuntimeWarning for “Task was destroyed but it is pending!” specifically because this footgun is common.
For any project that exposes a callback API where some hooks are async, the implementation needs to:
- Distinguish between blocking and fire-and-forget callbacks at the API level. Either by inspecting the coroutine signature or by letting users register the dispatch mode.
- Track scheduled tasks with strong references until they complete, never leaving them as orphans.
- Document the semantics. Users have a right to know whether their callback is guaranteed to run before the response returns or merely scheduled to run.
- Have a test for the exact MRE in #8842. A unit test that registers an async callback, calls
acompletion, and asserts the callback was invoked beforeacompletionreturned. The bug exists because that test doesn’t exist.
LiteLLM has an extensive test suite, but the async-callback dispatch path doesn’t appear to have a test that verifies callback invocation, only one that verifies callback registration. The bug is the gap between those two.
References
- Upstream issue: #8842
asyncio.TaskGroup(3.11+): docs.python.org- The “fire and forget” task pattern: discuss.python.org