-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
.items() | ||
.filter(|item| item.ty() == &ItemType::Profile) | ||
.map(|item| item.quantity()) | ||
.sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the SDKs will only send 1 profile per envelope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but Relay does not enforce that AFAIK, so its safer to handle the case where there's multiple profiles.
It'd be good to deploy getsentry/sentry#47703 before this PR is deployed to make sure we don't double count indexed profiles as processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Let's test it in prod 😄
/// | ||
/// In the future, we might actually extract metrics from profiles before dynamic sampling, | ||
/// like we do for transactions. At that point, this code should be removed, and we should | ||
/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence ends here kind of leaving you hanging at the end.
/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for | |
/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@olksdr I do hope the integration test covers the most important use case, let me know if you think we need more. |
Improvement to #2054: Before a profile reaches the `process_profile` stage, we count it as `Profile:Accepted`. So if `process_profile` _fails_ with an `Invalid` outcome, it is incorrect to log it in the same data category, because it would double count within that category. Log it as `ProfileIndexed` instead.
@phacops yes, let's coordinate when to do the deploy! |
Improvement to #2054: Since #2054 already introduced a translation from `Profile` to `ProfileIndexed` for sampling outcomes, this PR does not change anything functionally. It is merely an attempt to be consistent with #2056 and #2060, such that we consider the `metrics_extracted` flag any time we generate a profiling outcome.
…2060) Improvement to #2054: Same as #2056, but for rate limited outcomes: _Before_ metrics extraction, rate limited profiles should count as `Profile` ("processed" profile), because they have not been counted towards accepted profiles yet. _after_ metrics extraction, they should count as `ProfileIndexed`.
…2071) In #2056, #2060 and #2061, I wrongly assumed that dropped profiles should always be counted as `ProfileIndexed` after metrics extraction and dynamic sampling, in order to not double-count towards `Profile`. This is wrong, because as the PR description of #2054 clearly states, profiles that are not dropped by dynamic sampling are only counted as accepted in processing relays, _after_ dynamic sampling and rate limiting: > In processing Relays, if an envelope still contains profiles after dynamic sampling, log an Accepted outcome for the "processed" category. By restricting this to processing Relays, we can be sure that every profile is only counted once. Instead of checking the `metrics_extracted` flag, introduce a new flag that explicitly states whether a profile was already counted towards `DataCategory::Profile` or not, and evaluate that flag instead of `metrics_extracted`.
Instead of counting processed profiles in two different places (see #2054), add a `has_profile` tag to the transaction counter metric, and define ``` processed_profiles := count(processed_transactions) WHERE has_profile = true ``` ## Changes This PR contains the following changes: * Revert PRs around counting profiles. * Add `has_profile(s)` tag on `transaction.duration` metric. #2165 * Update metrics billing consumer to emit `accepted` outcome for profiles. getsentry/sentry#50047 ## Caveats With this PR, rate limits will be applied consistently if they are issued for `DataCategory.Transaction`. However, if a rate limit (e.g. spike protection) is issued _only_ for `DataCategory.Profile`, it will not be enforced. ## Rollout Order 1. Merge and deploy getsentry/sentry#50047. The billing metrics consumer will now listen for profiles, but not receive any. 3. Deploy processing Relays. Profile counts will go down, because PoP-Relays are not sending the `has_profile` tag yet. 4. Deploy PoP Relays. Profile counts should go back to normal. ## In case of rollback 1. First revert the sentry change (getsentry/sentry#50047) and deploy it to stop counting profiles. 2. Then revert the Relay change (this PR) and deploy to processing Relays and PoPs. ref: #2158 --------- Co-authored-by: Iker Barriocanal <[email protected]>
To facilitate a billing model that is consistent between transactions and profiles, #2051 introduced a new data category for profiles, such that
just like it is already the case for transactions:
In other words, "processed profiles" should count all valid, non rate limited profiles seen by Relays, even if they were dropped by dynamic sampling.
Difference between transactions and profiles
For transactions, we extract metrics before dynamic sampling, and those metrics are what we rate limit and eventually log as "accepted" for the "processed" transaction data category (
DataCategory.Transaction
).For profiles, we do not extract metrics (yet), so the outcomes for the "processed" profile data category have to be calculated in a different way.
How this PR achieves this goal
Accepted
outcome for the "processed" category. By restricting this to processing Relays, we can be sure that every profile is only counted once.2.1. translate
Filtered
outcomes with reasonSampled:
to anAccepted
outcome. This counts all profiles dropped by dynamic sampling, regardless of where the dynamic sampling took place (external relay, pop relay, processing relay).2.2. Also send the original
Filtered
outcome, but with data categoryDataCategory.TransactionIndexed
.By adding up the counts of these two disjoint sets, we should correctly count all profiles regardless of whether they were sampled or not.
Alternative proposal (rejected for now)
In order to actually line up behavior of transactions and profiles, we could start extracting a simple counter metric for profiles before dynamic sampling, and let that metric represent
DataCategory.Profile
-- this would mean that rate limits are applied to the metric, and the accepted outcome would be emitted from thebilling_metrics_consumer
, just like we do for theDataCategory.Transactions
. See this internal doc for more.Why the currrent approach was chosen
I personally believe that the alternative proposal described above would be more correct and easier to maintain, but: