Skip to content

Tool calling with LiteLLM and thinking models fail #765

Open
@gal-checksum

Description

@gal-checksum

Describe the bug

When running the agents SDK with tool calling and a thinking model through LITELLM (e.g. sonnet 4) getting this error

litellm.exceptions.BadRequestError: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.1.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block (preceeding the lastmost set of `tool_use` and `tool_result` blocks). We recommend you include thinking blocks from previous turns. To avoid this requirement, disable `thinking`. Please consult our documentation at https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking"}}

Debug information

  • Agents SDK version: 0.0.16
  • Python version 3.13

Repro steps

  1. Run the agents sdk with sonnet 4
  2. Produce a scenario that requires 2 tool calls or more
  3. Get the failure above

Expected behavior

Everything works :)

Activity

rm-openai

rm-openai commented on May 27, 2025

@rm-openai
Collaborator

Can you please provide a full working script? Happy to take a look!

gal-checksum

gal-checksum commented on May 28, 2025

@gal-checksum
Author

@rm-openai see below

from agents import (
    Agent,
    function_tool,
    RunContextWrapper,
    Runner,
    ModelSettings,
)
from dataclasses import dataclass
import asyncio
from agents.extensions.models.litellm_model import LitellmModel
import os
from openai.types import Reasoning


from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()


@dataclass
class Count:
    count: int


@function_tool
def count(ctx: RunContextWrapper[Count]) -> str:
    """
    Increments the count by 1 and returns the count

    Returns:
        A string with the count
    """
    ctx.context.count += 1
    return f"Counted to {ctx.context.count}"


count_ctx = Count(count=0)

agent = Agent[Count](
    name="Counter Agent",
    instructions="Count until the number the user tells you to stop using count tool",
    tools=[count],
    model=LitellmModel(
        model="anthropic/claude-sonnet-4-20250514",
        api_key=os.getenv("ANTHROPIC_API_KEY"),
    ),
    model_settings=ModelSettings(
        reasoning=Reasoning(effort="high", summary="detailed")
    ),
)


async def main():
    results = await Runner.run(
        agent, input="Count to 10", context=count_ctx, max_turns=30
    )
    print(results)


if __name__ == "__main__":
    asyncio.run(main())

knowsuchagency

knowsuchagency commented on Jun 4, 2025

@knowsuchagency

Root Cause Analysis and Current Status

TLDR: Thinking with tool calling for Anthropic is broken in LiteLLM.


I've investigated this issue thoroughly and determined the root cause is in LiteLLM, not the openai-agents-python SDK.

What's Actually Happening

  1. LiteLLM doesn't preserve thinking blocks when reconstructing conversation history for Anthropic's API
  2. When reasoning is enabled, Anthropic requires assistant messages to start with thinking blocks
  3. LiteLLM converts tool_calls to tool_use content blocks but loses the thinking context

Current Workarounds

Until LiteLLM fixes this upstream:

  1. Use OpenAI o4-mini - Full thinking + tools support works perfectly
  2. Disable reasoning for Anthropic when using tools
  3. Use reasoning without tools for Anthropic

Related Issues

Why No Fix in This SDK

I initially created a PR with a workaround, but decided against it because:

  • It's a dependency issue that should be fixed upstream
  • The workaround only partially works (single tool call only)
  • The agents SDK shouldn't need to know about provider-specific API quirks
gal-checksum

gal-checksum commented on Jun 4, 2025

@gal-checksum
Author

Thanks! Another workaorund could also be use Anthropic through the OpenAI Responses API Compatibility no?

Haven't tried it but should work

ukayani

ukayani commented on Jun 8, 2025

@ukayani

@knowsuchagency The bug you've linked is in the conversion between the responses api and completions api on the LiteLLM side. Although that is a legitimate issue, it isn't the reason why the LiteLLM issue exists. The LiteLLM abstraction in this sdk uses the acompletion api only from LiteLLM. I've tested LiteLLM directly with claude thinking models and the acompletion api preserves thinking blocks. The problem is that this SDK tries to convert completions to the responses api format and in doing so, it drops the specific properties on the LiteLLM models which hold the thinking block details.

See this issue i filed about this: #678

I think the issue you pointed out on the LiteLLM side may help if the openai sdk switches to using the responses api on litellm but currently its using acompletion directly.

Ultimately, i think the responses API types need some additional flexibility, to be able to preserve non-openai specific model provider details. OpenAI's responses api kinda has a reasoning summary but it doesn't expose the full reasoning blocks via api, hence the responses API doesn't really account for them properly. I believe newer claude models are also moving towards reasoning summaries so maybe some sort of consolidation could happen with the types.

The agents SDK shouldn't need to know about provider-specific API quirks

While i agree with this somewhat. The counter argument is that if the SDK claims to support third party providers and LiteLLM supports enabling thinking + tools with acompletions for most model providers, then this SDK should at the minimum have support for such a common scenario. You can't claim to support third party providers but then not work with reasoning + tool calls.

Naamsukh

Naamsukh commented on Jun 10, 2025

@Naamsukh

Thanks! Another workaorund could also be use Anthropic through the OpenAI Responses API Compatibility no?

Haven't tried it but should work

Yupp tried it via OpenAI Responses API , It works well.
agent.model = OpenAIChatCompletionsModel(model="claude-sonnet-4-20250514", openai_client=AsyncOpenAI(
base_url="https://api.anthropic.com/v1/",
api_key=settings.anthropic_api_key
))

ukayani

ukayani commented on Jun 11, 2025

@ukayani

Thanks! Another workaorund could also be use Anthropic through the OpenAI Responses API Compatibility no?
Haven't tried it but should work

Yupp tried it via OpenAI Responses API , It works well. agent.model = OpenAIChatCompletionsModel(model="claude-sonnet-4-20250514", openai_client=AsyncOpenAI( base_url="https://api.anthropic.com/v1/", api_key=settings.anthropic_api_key ))

Do you mind giving a full example, with tool calls? From the looks of your snippet, you're using the openai completions api compatibility provided by Anthropic rather than anything to do with the responses api.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @knowsuchagency@ukayani@Naamsukh@gal-checksum@rm-openai

        Issue actions

          Tool calling with LiteLLM and thinking models fail · Issue #765 · openai/openai-agents-python