Skip to content

How to integrate audio input in agent sdk? (not realtime or voice pipeline, e.g., gpt-4o audio) #738

Not planned
@101scholar

Description

@101scholar

Currently, input types include text, file and image. Is it possible to add audio input support, so that we can directly process audio input with potential function calling? Especially for models such as gpt-4o audio, and gemini-2.5-flash-preview.

Activity

github-actions

github-actions commented on May 29, 2025

@github-actions

This issue is stale because it has been open for 7 days with no activity.

github-actions

github-actions commented on Jun 1, 2025

@github-actions

This issue was closed because it has been inactive for 3 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDKstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @101scholar

        Issue actions

          How to integrate audio input in agent sdk? (not realtime or voice pipeline, e.g., gpt-4o audio) · Issue #738 · openai/openai-agents-python