Skip to content

Outline proposed architecture based on requirements #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: add/requirements-definition
Choose a base branch
from

Conversation

felixarntz
Copy link
Member

@felixarntz felixarntz commented Jun 18, 2025

This is a draft, based on the current requirements (see #1), and it depends on that PR being approved and merged first.

This draft is based on the requirements proposed in #1. For any changes that may be made prior to approval and merge, this PR will need to be updated accordingly.

For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.

Comment on lines +209 to +227
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation(), no need for generateResult().

Then, if starting with GenerativeAiOperation or GenerativeAiResult, there could be some toText, toImage, or stream() methods or so to transform the result into the desired shape.

Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts on this, I'm partially in agreement, partially not, partially not sure.

  • The generate*Result() vs generate*Operation() need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away.
  • The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.
  • For the methods generateText(), generateImage() etc., which technically would simply wrap generateTextResult(), generateImageResult() etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers: generateText() or generateTextResult()->text()?
  • Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in generateResult() and generateOperation(), given the SDK is built with a multimodal first mindset. For example, generateTextResult() is basically just forwarding to generateResult() with an outputModalities: [ 'text' ] config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so calling generateTextResult() or generateText() feels way more intuitive if you only want to generate text for example.
  • One exception is streamGenerateTextResult() which lives separately (not using generateResult() or generateOperation()), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.

TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.

@felixarntz felixarntz added [Type] Developer Documentation Documentation for developers and removed documentation labels Jun 18, 2025
@felixarntz felixarntz marked this pull request as ready for review June 20, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Developer Documentation Documentation for developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants