How to integrate audio input in agent sdk? (not realtime or voice pipeline, e.g., gpt-4o audio) #738

Not planned

Not planned

How to integrate audio input in agent sdk? (not realtime or voice pipeline, e.g., gpt-4o audio)#738

Labels

Currently, input types include text, file and image. Is it possible to add audio input support, so that we can directly process audio input with potential function calling? Especially for models such as gpt-4o audio, and gemini-2.5-flash-preview.

added

on May 22, 2025

github-actionsbot

This issue is stale because it has been open for 7 days with no activity.

added

on May 29, 2025

github-actionsbot

This issue was closed because it has been inactive for 3 days since being marked as stale.

closed this as not planned

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Participants