feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054

jjbayer · 2023-04-20T15:08:11Z

To facilitate a billing model that is consistent between transactions and profiles, #2051 introduced a new data category for profiles, such that

processed profiles = indexed profiles + sampled profiles  # "sampled" as in dropped by dynamic sampling

just like it is already the case for transactions:

processed transactions = indexed transactions + sampled transactions

In other words, "processed profiles" should count all valid, non rate limited profiles seen by Relays, even if they were dropped by dynamic sampling.

Difference between transactions and profiles

For transactions, we extract metrics before dynamic sampling, and those metrics are what we rate limit and eventually log as "accepted" for the "processed" transaction data category (DataCategory.Transaction).

For profiles, we do not extract metrics (yet), so the outcomes for the "processed" profile data category have to be calculated in a different way.

How this PR achieves this goal

In processing Relays, if an envelope still contains profiles after dynamic sampling, log an Accepted outcome for the "processed" category. By restricting this to processing Relays, we can be sure that every profile is only counted once.
Also in processing Relays, shortly before reporting outcomes to kafka,
2.1. translate Filtered outcomes with reason Sampled: to an Accepted outcome. This counts all profiles dropped by dynamic sampling, regardless of where the dynamic sampling took place (external relay, pop relay, processing relay).
2.2. Also send the original Filtered outcome, but with data category DataCategory.TransactionIndexed.

By adding up the counts of these two disjoint sets, we should correctly count all profiles regardless of whether they were sampled or not.

Alternative proposal (rejected for now)

In order to actually line up behavior of transactions and profiles, we could start extracting a simple counter metric for profiles before dynamic sampling, and let that metric represent DataCategory.Profile -- this would mean that rate limits are applied to the metric, and the accepted outcome would be emitted from the billing_metrics_consumer, just like we do for the DataCategory.Transactions. See this internal doc for more.

Why the currrent approach was chosen

I personally believe that the alternative proposal described above would be more correct and easier to maintain, but:

By containing all new logic to processing relays, we will correctly count "processed" profiles even if they were dropped by dynamic sampling outdated external relays.
In a wider sense, by containing all new logic to processing relays, we can iterate faster (deploy hotfixes, etc.) without having to worry about the behavior of external relays (which we cannot update).
This change is easy to revert if needed. The alternative solution would be distributed across nodes, from external relays to sentry.

phacops

LGTM

phacops · 2023-04-21T00:51:29Z

relay-server/src/actors/processor.rs

+            .items()
+            .filter(|item| item.ty() == &ItemType::Profile)
+            .map(|item| item.quantity())
+            .sum();


Technically, the SDKs will only send 1 profile per envelope.

Makes sense, but Relay does not enforce that AFAIK, so its safer to handle the case where there's multiple profiles.

phacops · 2023-04-21T17:41:37Z

It'd be good to deploy getsentry/sentry#47703 before this PR is deployed to make sure we don't double count indexed profiles as processed.

olksdr

lgtm. Let's test it in prod 😄

olksdr · 2023-04-24T07:24:11Z

relay-server/src/actors/processor.rs

+    ///
+    /// In the future, we might actually extract metrics from profiles before dynamic sampling,
+    /// like we do for transactions. At that point, this code should be removed, and we should
+    /// enforce rate limits and emit outcomes based on the collect profile metric, as we do for


The sentence ends here kind of leaving you hanging at the end.

Suggested change

/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for

/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for transactions

jjbayer · 2023-04-24T07:50:51Z

lgtm. Let's test it in prod 😄

@olksdr I do hope the integration test covers the most important use case, let me know if you think we need more.

Improvement to #2054: Before a profile reaches the `process_profile` stage, we count it as `Profile:Accepted`. So if `process_profile` _fails_ with an `Invalid` outcome, it is incorrect to log it in the same data category, because it would double count within that category. Log it as `ProfileIndexed` instead.

jjbayer · 2023-04-24T07:57:45Z

It'd be good to deploy getsentry/sentry#47703 before this PR is deployed to make sure we don't double count indexed profiles as processed.

@phacops yes, let's coordinate when to do the deploy!

Improvement to #2054: Since #2054 already introduced a translation from `Profile` to `ProfileIndexed` for sampling outcomes, this PR does not change anything functionally. It is merely an attempt to be consistent with #2056 and #2060, such that we consider the `metrics_extracted` flag any time we generate a profiling outcome.

…2060) Improvement to #2054: Same as #2056, but for rate limited outcomes: _Before_ metrics extraction, rate limited profiles should count as `Profile` ("processed" profile), because they have not been counted towards accepted profiles yet. _after_ metrics extraction, they should count as `ProfileIndexed`.

…2071) In #2056, #2060 and #2061, I wrongly assumed that dropped profiles should always be counted as `ProfileIndexed` after metrics extraction and dynamic sampling, in order to not double-count towards `Profile`. This is wrong, because as the PR description of #2054 clearly states, profiles that are not dropped by dynamic sampling are only counted as accepted in processing relays, _after_ dynamic sampling and rate limiting: > In processing Relays, if an envelope still contains profiles after dynamic sampling, log an Accepted outcome for the "processed" category. By restricting this to processing Relays, we can be sure that every profile is only counted once. Instead of checking the `metrics_extracted` flag, introduce a new flag that explicitly states whether a profile was already counted towards `DataCategory::Profile` or not, and evaluate that flag instead of `metrics_extracted`.

…d by sampling (#2054)" This reverts commit 4801ec2.

Instead of counting processed profiles in two different places (see #2054), add a `has_profile` tag to the transaction counter metric, and define ``` processed_profiles := count(processed_transactions) WHERE has_profile = true ``` ## Changes This PR contains the following changes: * Revert PRs around counting profiles. * Add `has_profile(s)` tag on `transaction.duration` metric. #2165 * Update metrics billing consumer to emit `accepted` outcome for profiles. getsentry/sentry#50047 ## Caveats With this PR, rate limits will be applied consistently if they are issued for `DataCategory.Transaction`. However, if a rate limit (e.g. spike protection) is issued _only_ for `DataCategory.Profile`, it will not be enforced. ## Rollout Order 1. Merge and deploy getsentry/sentry#50047. The billing metrics consumer will now listen for profiles, but not receive any. 3. Deploy processing Relays. Profile counts will go down, because PoP-Relays are not sending the `has_profile` tag yet. 4. Deploy PoP Relays. Profile counts should go back to normal. ## In case of rollback 1. First revert the sentry change (getsentry/sentry#50047) and deploy it to stop counting profiles. 2. Then revert the Relay change (this PR) and deploy to processing Relays and PoPs. ref: #2158 --------- Co-authored-by: Iker Barriocanal <[email protected]>

jjbayer added 10 commits April 20, 2023 17:03

feat: Transform outcomes

eb990ae

feat: Log Accepted outcome for processed profiles

186fbde

test: stub

85607c8

wip

959446d

test

ea765b6

test

9572e48

fix: Envelope summary problem

5730320

test: passes for single relay

3b30ad3

test: pass

d11c6e3

self review

9bf35d6

phacops reviewed Apr 21, 2023

View reviewed changes

jjbayer marked this pull request as ready for review April 21, 2023 09:04

jjbayer requested a review from a team April 21, 2023 09:04

jjbayer mentioned this pull request Apr 21, 2023

fix(profiles): Log ProfileIndexed outcome for invalid profiles #2056

Merged

olksdr approved these changes Apr 24, 2023

View reviewed changes

olksdr reviewed Apr 24, 2023

View reviewed changes

doc: missing word

8ab5975

This was referenced Apr 24, 2023

chore(profiling): Switch data category for indexed profiles getsentry/sentry#47703

Merged

fix(profiles): Log ProfileIndexed outcome for rate limited profiles #2060

Merged

fix(profiles): Log sampling outcome as ProfileIndexed #2061

Merged

jjbayer added 2 commits April 24, 2023 13:16

phacops approved these changes Apr 24, 2023

View reviewed changes

doc: changelog

bdae22e

jjbayer merged commit 4801ec2 into master Apr 24, 2023

jjbayer deleted the feat/indexed-profile-outcomes branch April 24, 2023 11:58

This was referenced Apr 25, 2023

Rate limit inconsistencies between transactions and profiles #2064

Closed

fix(profiles): Only use ProfileIndexed after counting as Profile #2071

Merged

jjbayer mentioned this pull request May 26, 2023

[EPIC] Ensure processed transactions always exist for processed profiles #2158

Closed

jjbayer added a commit that referenced this pull request May 31, 2023

Revert "feat(profiles): Emit "accepted" outcomes for profiles filtere…

2e3c4cf

…d by sampling (#2054)" This reverts commit 4801ec2.

jjbayer mentioned this pull request May 31, 2023

ref(profiles): Counting profiles, V2 #2163

Merged

iker-barriocanal mentioned this pull request Jun 1, 2023

ref(profiles): Count processed profiles with metrics #2165

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054

feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054

jjbayer commented Apr 20, 2023 •

edited

Loading

phacops left a comment

phacops Apr 21, 2023

jjbayer Apr 24, 2023

phacops commented Apr 21, 2023

olksdr left a comment

olksdr Apr 24, 2023

jjbayer Apr 24, 2023

jjbayer commented Apr 24, 2023

jjbayer commented Apr 24, 2023

	/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for
	/// enforce rate limits and emit outcomes based on the collect profile metric, as we do for transactions

feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054

feat(profiles): Emit "accepted" outcomes for profiles filtered by sampling #2054

Conversation

jjbayer commented Apr 20, 2023 • edited Loading

Difference between transactions and profiles

How this PR achieves this goal

Alternative proposal (rejected for now)

Why the currrent approach was chosen

phacops left a comment

Choose a reason for hiding this comment

phacops Apr 21, 2023

Choose a reason for hiding this comment

jjbayer Apr 24, 2023

Choose a reason for hiding this comment

phacops commented Apr 21, 2023

olksdr left a comment

Choose a reason for hiding this comment

olksdr Apr 24, 2023

Choose a reason for hiding this comment

jjbayer Apr 24, 2023

Choose a reason for hiding this comment

jjbayer commented Apr 24, 2023

jjbayer commented Apr 24, 2023

jjbayer commented Apr 20, 2023 •

edited

Loading