Skip to content

Can tools ingest images attached to image_url? #875

Open
@polaon

Description

@polaon

Hello,

I'd like to create a tool, which can ingest images passed via {"type": "input_image", "image_url": "..."} schema, adjust them and then return either the adjusted image (which I think is not currently possible: Issue 341), or a HTTPS URL.

Example:

import io
from pathlib import Path

from agents import Agent, function_tool, Runner


def to_base64_encoded_str(file_path: Path) -> str:
    with open(file_path, mode="rb") as image_file:
        return base64.b64encode(image_file.read()).decode(encoding="utf-8")


@function_tool(docstring_style="google")
def change_brightness(image_url: str, change: Literal["increase", "decrease"]) -> str:
    """Changes brightness of the input image.

    Args:
        image_url: Data URI of the image as defined by [IETF RFC 2397 document](https://datatracker.ietf.org/doc/html/rfc2397). The URI is of the form: `data:[<mediatype>][;base64],<data>`. The `<mediatype>` is an Internet media type specification. The appearance of `;base64` means that the data is encoded as base64. The image data is expected to be encoded as base64.
        change: Increase or decrease the image brightness.

    Returns:
        Download URL to the adjusted image.
    """
    with io.BytesIO(base64.b64decode(image_url, validate=True)) as image_io:
        ...


agent = Agent(name="My Agent", tools=[change_brightness])

result = await Runner.run(
    starting_agent=agent,
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "I'd like you to increase brightness of the following image.",
                },
                {
                    "type": "input_image",
                    "image_url": "data:image/png;base64," + to_base64_encoded_str(Path("./image.png")),
                },
            ],
        }
    ],
)

Please is it possible for the tools to ingest images passed via {"type": "input_image", "image_url": "..."} schema? If yes how? I couldn't find any such mention in the documentation and when I tried this, it seems that the agent never passes the expected image_url to the function tool, but it hallucinates some random URL instead.

Thank you very much for any advice and have a nice day.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions