Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Metric Stats #3147

Closed
40 of 43 tasks
Dav1dde opened this issue Feb 20, 2024 · 4 comments
Closed
40 of 43 tasks

[EPIC] Metric Stats #3147

Dav1dde opened this issue Feb 20, 2024 · 4 comments
Assignees

Comments

@Dav1dde
Copy link
Member

Dav1dde commented Feb 20, 2024

Description

We need outcomes for DDM / custom metrics.

Outcomes for Metrics are generally a tricky problem for several reasons. Metrics are aggregated in multiple stages, SDKs, customer Relays, pop Relays and processing Relays, which makes giving an accurate number of volume hard. On top of that, volume is just a small scaling factor we have to consider, a much bigger factor is the cardinality of a metric.

Ideally outcomes can capture the volume and the cardinality of a metric.

Outcomes should tell us the volume of a single metric (defined by its MRI) and its cardinality per hour.

Our current outcomes cannot capture this information, we need a new mechanism to collect metric outcomes.

Requirements

  • Indefinite (?) retention
  • Fast enough to query for billing purposes
  • Volume and Cardinality needs to be represented
  • Sentry UI needs access to show the user
  • Billing needs access for billing
  • Bizops needs access to process with their own pipelines (can they query snuba instead of reading the topic?)

Why not use Outcomes?

  • Outcomes currently don't provide a way to group by metric name (incl. metric type and namespace).
  • Cardinality needs to be max() aggregated not sum()'ed

Quantity / Volume

We want to determine the volume of metrics received by the first layer of our infrastructure (PoP-Relays). Client side aggregated metrics are counted with a quantity of 1.

For example: If Relay receives 500 statsd items for a single metric per hour, this metric would be considered to have a volume/quantity of 500 metrics per hour.

Cardinality

We are interested in the cardinality of a single metric (MRI) per hour.

Cardinality can be queried from storage or collected through the Relay cardinality limiter.

Metric Stats Namespaces

Volume: c:metric_stats/volume@none
Tags:

  • mri: Metric Name/MRI: <type>:<namespace>/<name>[@<unit>]
  • mri.type: Metric Type
  • mri.namespace: Metric namespace (extracted from the MRI)
  • outcome.id: Outcome ID, metric outcomes share the same numeric outcome ID with regular outcomes.
  • outcome.reason: Optional machine readable, free-form reason code.

Cardinality: g:metric_stats/cardinality@none

  • mri: Metric Name/MRI: <type>:<namespace>/<name>[@<unit>]
  • mri.type: Metric Type
  • mri.namespace: Metric namespace (extracted from the MRI)
  • cardinality.limit: The cardinality limit id which generated this report.
  • cardinality.window: Cardinality window size in seconds.
  • cardinality.scope: Cardinality scope (name, project, organization).

If cardinality is tracked by a project or organization the mri* tags will not be present.

Examples

Payloads/Metrics emitted from Relay into the generic metrics topic.

Volume - Accepted
{
   "org_id":0,
   "project_id":42,
   "name":"c:metric_stats/volume@none",
   "type":"c",
   "value":2.0,
   "timestamp":1712223931,
   "tags":{
        "mri": "d:custom/foo@none",
        "mri.type": "d",
        "mri.namespace": "custom",
        "outcome.id": "0",
   },
   "retention_days":90
}
Cardinality by Name
{
   "org_id":0,
   "project_id":42,
   "name":"g:metric_stats/cardinality@none",
   "type":"g",
   "value":{
      "last":2.0,
      "min":2.0,
      "max":2.0,
      "sum":2.0,
      "count":1
   },
   "timestamp":1712223931,
   "tags":{
        "mri": "s:custom/bar@none",
        "mri.type": "s",
        "mri.namespace": "custom",
        "cardinality.limit": "custom-limit-with-some-id",
        "cardinality.scope": "name",
        "cardinality.window": "3600",
   },
   "retention_days":90
}
Cardinality by Type
{
   "org_id":0,
   "project_id":42,
   "name":"g:metric_stats/cardinality@none",
   "type":"g",
   "value":{
      "last":2.0,
      "min":2.0,
      "max":2.0,
      "sum":2.0,
      "count":1
   },
   "timestamp":1712223931,
   "tags":{
        "mri.type": "s",
        "mri.namespace": "custom",
        "cardinality.limit": "custom-limit-with-some-id",
        "cardinality.scope": "type",
        "cardinality.window": "3600",
   },
   "retention_days":90
}

Milestone 1 ✅ - Volume / Happy Path

Implement the happy path ("accepted") for volume metric stats: c:metric_stats/volume@none.

Tasks

Preview Give feedback
  1. Dav1dde
  2. Dav1dde
  3. Dav1dde
  4. Scope: Backend
    Dav1dde
  5. Dav1dde
  6. Scope: Backend
    Dav1dde

Milestone 2 ✅ - Cardinality / Happy Path

Implement the happy path ("accepted") for cardinality metric stats: g:metric_stats/cardinality@none.

Tasks

Preview Give feedback
  1. Dav1dde
  2. Dav1dde
  3. Dav1dde
  4. Dav1dde
  5. Scope: Backend
    Dav1dde

Milestone 2.5 ✅ - Cardinality by Minute and Project

Tasks

Preview Give feedback
  1. Dav1dde
  2. Scope: Backend
    Dav1dde
  3. Dav1dde

Milestone 3 ✅ - Negative Outcomes

Milestone 4 - Finishing Touches

Tasks

Preview Give feedback
  1. Dav1dde
  2. Dav1dde
  3. Dav1dde
  4. Dav1dde
  5. Dav1dde
  6. Dav1dde
  7. Dav1dde
@mcannizz
Copy link
Member

@Dav1dde I am wondering if this work is still in flight, and whether we consider that work to be P0. It seems like the remaining tasks are lower priority. If you agree, could you please move the remaining stuff to a lower priority epic and close this one?

@Dav1dde
Copy link
Member Author

Dav1dde commented May 16, 2024

I would like to keep the Epic open, we are at a stage where we provide all the data which is needed now but we're not at a stage that this data will stay correct with upcoming features.
E.g. currently metric_stats does not interact well with user defined cardinality limits which the product is already planning.

Basically at this point we implemented the minimum set everyone else needs but the remaining legwork in Relay is as important just not visible to anyone outside of ingest.

That being said, if moving the remaining tasks to a new Epic makes stuff easier for you I am not opposed to it.

@mcannizz
Copy link
Member

@Dav1dde makes sense, thanks. I'm going to leave this epic as is but reduce it's priority since it sounds like we've completed the P0 work. Feel free to adjust as you see fit.

@Dav1dde
Copy link
Member Author

Dav1dde commented Jun 3, 2024

Closing this, we got all the important bits in place and with the current restructure of the metrics product the remaining pieces are not as important anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants