-
Notifications
You must be signed in to change notification settings - Fork 1
Outline proposed architecture based on requirements #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: add/requirements-definition
Are you sure you want to change the base?
Outline proposed architecture based on requirements #2
Conversation
+generateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) string$ | ||
+streamGenerateText(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< string >$ | ||
+generateImage(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+textToSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateSpeech(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) File$ | ||
+generateEmbeddings(Message[] $input, AiModel $model) Embedding[]$ | ||
+generateResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+streamGenerateTextResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) Generator< GenerativeAiResult >$ | ||
+generateImageResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+textToSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateSpeechResult(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiResult$ | ||
+generateEmbeddingsResult(string[]|Message[] $input, AiModel $model) EmbeddingResult$ | ||
+generateTextOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateImageOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+textToSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateSpeechOperation(string|MessagePart|MessagePart[]|Message|Message[] $prompt, AiModel $model) GenerativeAiOperation$ | ||
+generateEmbeddingsOperation(string[]|Message[] $input, AiModel $model) EmbeddingOperation$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd probably consolidate these into fewer methods to reduce the public API surface and make them more composable. For example, you can get a result easily via generateOperation()
, no need for generateResult()
.
Then, if starting with GenerativeAiOperation
or GenerativeAiResult
, there could be some toText
, toImage
, or stream()
methods or so to transform the result into the desired shape.
Just thinking out loud though. Best to get some feedback out in the wild from developers building with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts on this, I'm partially in agreement, partially not, partially not sure.
- The
generate*Result()
vsgenerate*Operation()
need to remain separate because they can fundamentally how they invoke an operation. Technically, you can wrap everything in an operation of course, but the implication of triggering an operation is very different from wanting a result right away - it's clear that an operation may take longer than you can wait for in this request, whereas wanting a result is explicitly waiting to get it right away. - The same applies for streaming vs not streaming, it triggers a fundamentally different kind of request handling chain, so that needs to remain separate at the root too.
- For the methods
generateText()
,generateImage()
etc., which technically would simply wrapgenerateTextResult()
,generateImageResult()
etc., I can see what you're saying would make sense. At this point the question is what is more intuitive and/or convenient for developers:generateText()
orgenerateTextResult()->text()
? - Overall, most of these methods will be very brief wrappers of other methods. Pretty much all the heavy lifting will happen in
generateResult()
andgenerateOperation()
, given the SDK is built with a multimodal first mindset. For example,generateTextResult()
is basically just forwarding togenerateResult()
with anoutputModalities: [ 'text' ]
config arg injected. But passing that manually in would be very verbose, and while the API needs to be multimodal-first to be flexible, this would make usage unnecessary complex if you always have to think in that way - so callinggenerateTextResult()
orgenerateText()
feels way more intuitive if you only want to generate text for example. - One exception is
streamGenerateTextResult()
which lives separately (not usinggenerateResult()
orgenerateOperation()
), although we could even think there about how this could be abstracted to support multimodal streaming. That part goes a bit above my head right now, so it's not in here, but it certainly could be, if we want to support streaming beyond just text output.
TL;DR: For all the wrapper methods (which almost all of these are), we could consider handling them in another way. But I wouldn't say the current approach isn't ideal just because it's a large list of methods on the entrypoint object - it depends on what API developers consider more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously very limited due to character limit, but I just created https://x.com/felixarntz/status/1936116496658579717, maybe it'll give us at least a rough idea.
This is a draft, based on the current requirements (see #1), and it depends on that PR being approved and merged first.
This draft is based on the requirements proposed in #1. For any changes that may be made prior to approval and merge, this PR will need to be updated accordingly.
For reference: An older architectural outline and related ideas were discussed in felixarntz/ai-services#22, and to some degree in felixarntz/ai-services#21.