-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metric-outcomes): Add initial topic and schema #231
Conversation
@@ -0,0 +1,16 @@ | |||
topic: outcomes-metrics | |||
pipeline: outcomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is used primarily to identify a pipeline logically. @lynnagara and @untitaker can provide more context.
But it should be correct as we call outcomes
everything that populates the outcomes and billing data,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the metrics outcomes eventually makes it's way to the shared outcomes table in clickhouse with all the other outcomes? So this pipeline is probably fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be parallel to the current outcomes with their own schema and from the looks of it in a separate clickhouse cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everything is fully separate, I think you can change this to outcomes-metrics
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I don't have context on how these schemas are used so you may want to get another approval.
}, | ||
"outcome": { | ||
"description": "Outcome ID, metric outcomes share the same numeric outcome ID with regular outcomes.", | ||
"$ref": "#/definitions/U64" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should limit this to existing outcome IDs, or deliberately leave it open. I tend towards leaving it open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to follow the outcomes schema and leave it open.
@@ -0,0 +1,16 @@ | |||
topic: outcomes-metrics | |||
pipeline: outcomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is used primarily to identify a pipeline logically. @lynnagara and @untitaker can provide more context.
But it should be correct as we call outcomes
everything that populates the outcomes and billing data,
producers: | ||
- getsentry/relay | ||
consumers: | ||
- getsentry/snuba | ||
- getsentry/super-big-consumers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pipeline meant to bypass the current metrics outcomes pipeline that takes data in after the indexer ? If yes can we get rid of the old one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with any (other) metrics outcomes pipeline, so I assume the answer is yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the billing metrics consumer https://github.com/getsentry/sentry/blob/0730d277c3ad16c4c8aaebd7b2edd51f08819dea/src/sentry/ingest/billing_metrics_consumer.py#L41
I cannot find the design docs anymore.
"cardinality": { | ||
"description": "Maximum observed cardinality of the metric.", | ||
"$ref": "#/definitions/U64" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume for "max cardinality" we mean the number of distinct combinations of tags keys:values observed in the bucket ? OR over a period of time ?
This would be useful information in the schema for who decides to consume from this topic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is over time, current plan for custom metrics is over the span of an hour (because that's what people are interested and and what Relay will be enforcing).
I'll try to improve the description, but I was a bit vague on purpose because the time frame may change and may be different per usecase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to improve the description, but I was a bit vague on purpose because the time frame may change and may be different per usecase.
I think this may highlight a concern. If the time frame changes, that likely concerns both the producer and the consumers which can easily end up in incidents where the billing code miscounts metrics for assuming a wrong time frame. max_cardinality over a day vs over a minute is likely going to mean different things for the billing code.
I would suggest an alternative approach to make this more robust: add a time_frame
field in the message that indicates what the time frame is.
This puts the responsibility to manage the time frame on the consumer in an explicit way and it allows you to change the time frame in a safe and backwards compatible way as the consumer immediately knows how to interpret the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced cardinality_window
which is required when a cardinality
is emitted.
"quantity": { | ||
"description": "Amount of metric buckets accepted by Relay (volume).", | ||
"$ref": "#/definitions/U64" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as below. What's the time range we are talking about here? Is it fixed or is it dependent on relay bucketing time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is independent of time, this is the amount of statsd elements Relay receives and parses out of envelopes without any aggregation.
@@ -0,0 +1,50 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what extent is the outcomes-metrics
schema actually different from outcomes
or is it mostly the same?
In Snuba, all of outcomes
, outcomes-billing
and loadbalancer-outcomes
all share the exact same consumer code, and we just re-use the same schema in those scenarios. In the past we generally focused on having schemas reflect the minimum set of requirements for a consumer to work correctly. If the consumer is the same one, it might be simpler to use the existing outcomes schema for the new topic rather than maintain a separate one. On the other hand if the consumer code is different, I think a different schema here makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest differences to the existing outcomes are:
- Metric outcomes are tracked by MRI (consumer is expected to parse type and namespace from the MRI and materialize it in the database)
- Metric outcomes have quantity and cardinality tracked, where latter is aggregated max by hour instead of sum
We considered trying to make this work with the current outcomes but came to the conclusion that the metric outcomes are too different and we'd rather have a separate concept just for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me assuming that there will be a fully separate snuba consumer (and possibly clickhouse schema) and it's not shared with existing outcomes. If this ends up not being the case, I'd suggest later merging this together with outcomes so there would be one less schema to maintain.
We've decided to use generic metrics for now, I'll be cleaning up the Kafka topic. |
Preliminary schema for the
outcomes-metrics
topic, work described and tracked in getsentry/relay#3147.The cardinality description is vague on purpose, custom metrics will most likely have cardinality tracked per hour, but this is subject to change and may be different for other metric types (or even completely missing).