Outline proposed architecture based on requirements #2

felixarntz · 2025-06-18T03:30:09Z

This is a draft, based on the current requirements (see #1), and it depends on that PR being approved and merged first.

This draft is based on the requirements proposed in #1. For any changes that may be made prior to approval and merge, this PR will need to be updated accordingly.

For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.

swissspidy · 2025-06-18T12:12:58Z

docs/ARCHITECTURE.md

+            +generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$
+            +streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$
+            +generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$
+            +generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$
+            +generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$
+            +generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$
+            +generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$
+            +generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$
+            +generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$


Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation(), no need for generateResult().

Then, if starting with GenerativeAiOperation or GenerativeAiResult, there could be some toText, toImage, or stream() methods or so to transform the result into the desired shape.

Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)

A few thoughts on this, I'm partially in agreement, partially not, partially not sure.

The generate*Result() vs generate*Operation() need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away.

The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.

For the methods generateText(), generateImage() etc., which technically would simply wrap generateTextResult(), generateImageResult() etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers: generateText() or generateTextResult()->text()?

Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in generateResult() and generateOperation(), given the SDK is built with a multimodal first mindset. For example, generateTextResult() is basically just forwarding to generateResult() with an outputModalities: [ 'text' ] config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so calling generateTextResult() or generateText() feels way more intuitive if you only want to generate text for example.

One exception is streamGenerateTextResult() which lives separately (not using generateResult() or generateOperation()), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.

TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.

Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.

Outline proposed architecture based on requirements.

aac973e

felixarntz added the documentation label Jun 18, 2025

swissspidy reviewed Jun 18, 2025

View reviewed changes

felixarntz added [Type] Developer Documentation Documentation for developers and removed documentation labels Jun 18, 2025

felixarntz marked this pull request as ready for review June 20, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Outline proposed architecture based on requirements #2

Outline proposed architecture based on requirements #2

felixarntz commented Jun 18, 2025 •

edited

Loading

Uh oh!

swissspidy Jun 18, 2025

Uh oh!

felixarntz Jun 20, 2025

Uh oh!

felixarntz Jun 20, 2025

Uh oh!

Uh oh!

Outline proposed architecture based on requirements #2

Are you sure you want to change the base?

Outline proposed architecture based on requirements #2

Conversation

felixarntz commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swissspidy Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

felixarntz Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

felixarntz commented Jun 18, 2025 •

edited

Loading