Skip to content

Unbalanced tool lifecycle callbacks for hallucinated tools cause TraceManager stack corruption in plugins #4775

@evekhm

Description

@evekhm

🔴 Required Information

Is your feature request related to a specific problem?

Yes, this is related to a bug in both the ADK framework and plugin architecture that leads to corrupted OpenTelemetry span traces and incorrect plugin state when an LLM hallucinates a tool.

When the LLM suggests a tool that does not exist, src/google/adk/flows/llm_flows/functions.py specifically handles the ValueError by bypassing before_tool_callback completely and jumping straight to on_tool_error_callback. It also acts outside of the standard tracer.start_as_current_span context.

Because BigQueryAgentAnalyticsPlugin assumes standard balanced lifecycle hooks (a call to before_tool_callback matched with after_tool_callback or on_tool_error_callback), it blindly pops an item off its internal TraceManager stack during on_tool_error_callback via TraceManager.pop_span(). Since before_tool_callback never fired to push the tool's span onto the stack, the plugin inadvertently pops the parent's span (usually the Agent's span) and calls .end() on it prematurely. This corrupts the observability trace stack and records the error against the agent's span directly instead of the tool's span.

Describe the Solution You'd Like

  1. Framework Fix: The ADK runner should invoke before_tool_callback with the dummy/uninitialized BaseTool (which it currently creates for the error callback anyway), run the OTel context manager, and then invoke on_tool_error_callback, ensuring that the lifecycle is balanced.
  2. Plugin Fix: The BigQueryAgentAnalyticsPlugin.TraceManager should be more resilient. push_span() and pop_span() should ideally store or validate the span_type (e.g., agent vs. tool) or confirm that the popped span actually belongs to the tool that errored, rather than blindly popping off the stack.

Impact on your work

This corruption cascades through the observability trace hierarchy whenever hallucinated tool calls occur. For instance, the TOOL_ERROR logs appear with the Agent's span_id, and subsequent agent steps may log under the wrong parent span ID. This breaks our observability pipelines and makes root cause analysis using tool_events_view highly convoluted.

Proposed API / Implementation

In src/google/adk/flows/llm_flows/functions.py, rather than catching the ValueError and bypassing the main logic flow, the error should be handled identically to standard tool runtime exceptions inside the _run_with_trace logic. That way the before_tool_callback is executed consistently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    tracing[Component] This issue is related to OpenTelemetry tracing

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions