Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(breakdowns): Schema change for span op breakdowns #1761

Merged
merged 3 commits into from
Mar 30, 2021

Conversation

dashed
Copy link
Member

@dashed dashed commented Mar 17, 2021

Based on #1233

Associated change in Relay: getsentry/relay#934

Add span op breakdowns column to the transactions dataset that has identical schema to the measurements column.

@dashed dashed requested a review from a team March 17, 2021 00:33
@dashed dashed self-assigned this Mar 17, 2021
@dashed dashed requested review from lynnagara and evanh March 17, 2021 18:58
@evanh
Copy link
Member

evanh commented Mar 18, 2021

Per the offline discussion in Slack, did you decide on whether the durations would fit into a UInt32? On the subject of UInt, are you absolutely sure that there can never be a negative duration? Are the start and end timestamps generated on the same device? Is there something that is doing abs(end - start)?

@dashed
Copy link
Member Author

dashed commented Mar 18, 2021

@evanh I'm going ahead with UInt64. The delta values will be non-negative and will be guaranteed by Relay.

@evanh
Copy link
Member

evanh commented Mar 18, 2021

@JTCunning DDL Changes:

ALTER TABLE transactions_local ADD COLUMN IF NOT EXISTS span_op_breakdowns Nested(key LowCardinality(String), value UInt64) AFTER measurements;

@evanh
Copy link
Member

evanh commented Mar 19, 2021

@dashed Is there documentation somewhere on the decision between nanoseconds/UInt64 and milliseconds/UInt32?

@dashed
Copy link
Member Author

dashed commented Mar 19, 2021

@evanh Kevan (@k-fish ) is documenting this decision here https://www.notion.so/sentry/Ops-Breakdown-c81469ee3be04f7eb9394c41e62a2dc2

@dashed
Copy link
Member Author

dashed commented Mar 22, 2021

I'm going to update this to be float64 based on feedback from Jan so that breakdowns are forward-compatible with the future metrics product. We expect to keep this column until breakdowns are ported to the metrics product. See: getsentry/relay#934 (comment) and getsentry/relay#958

@evanh
Copy link
Member

evanh commented Mar 23, 2021

I'm going to update this to be float64

@dashed Does this mean that the values in the breakdown field will be in seconds? What level of precision do you need for the data? I ask because Clickhouse also has a Float32 type, which would give some of the benefits we discussed with Int64 vs Int32. However there will be a loss of granularity with that choice. Assuming that the durations are in the range of a few hours at most (~3600 seconds) then the precision would be submillisecond but not microsecond.

@dashed
Copy link
Member Author

dashed commented Mar 23, 2021

@evanh For span op breakdowns, they're in milliseconds. We'd like to ensure we can read microsecond values.

For any general breakdowns, the values will be in float64 when transmitted from Relay. I'd like to keep the types consistent especially when values are crossing the network boundary.

Would there be a risk trying to fit a float64 value into a float32 column?

@evanh
Copy link
Member

evanh commented Mar 23, 2021

Ah if you're sending milliseconds and you want microsecond precision then I think we need to stick with Float64. I just wanted to double check.

@dashed dashed force-pushed the breakdowns-schema branch from 6adebce to 010838b Compare March 24, 2021 20:42
@dashed dashed requested a review from evanh March 25, 2021 11:29
Copy link
Member

@evanh evanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Please do not merge this until James has had a chance to look at the DDL.

@JTCunning New DDL:

ALTER TABLE transactions_local ADD COLUMN IF NOT EXISTS span_op_breakdowns Nested(key LowCardinality(String), value Float64) AFTER measurements;

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the after parameter.

]
),
),
after="measurements",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work.
the measurements column does not exist after the DDL operation that creates it.
After such ddl you have measurements.key and measurements.value columns.
So this should be
after="measurements.value"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fpacifici Good catch. I updated to after="measurements.value".

Interestingly, the span_op_breakdowns columns are added after the measurements columns in my local Clickhouse instance.

@dashed dashed requested a review from fpacifici March 30, 2021 00:06
@codecov-io
Copy link

Codecov Report

Merging #1761 (20f0442) into master (5109139) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1761      +/-   ##
==========================================
+ Coverage   90.84%   90.86%   +0.01%     
==========================================
  Files         486      487       +1     
  Lines       22007    22047      +40     
==========================================
+ Hits        19992    20032      +40     
  Misses       2015     2015              
Impacted Files Coverage Δ
snuba/datasets/storages/transactions.py 100.00% <ø> (ø)
snuba/migrations/groups.py 93.75% <ø> (ø)
...ctions/0011_transactions_add_span_op_breakdowns.py 100.00% <100.00%> (ø)
snuba/datasets/storages/discover.py 100.00% <0.00%> (ø)
snuba/datasets/storages/errors_common.py 100.00% <0.00%> (ø)
snuba/datasets/storages/events_common.py 100.00% <0.00%> (ø)
tests/query/processors/test_bool_context.py 100.00% <0.00%> (ø)
snuba/datasets/storages/events_bool_contexts.py 100.00% <0.00%> (ø)
tests/query/processors/test_mapping_promoter.py 100.00% <0.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5109139...20f0442. Read the comment docs.

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration has been applied to prod. Please feel free to merge

@dashed
Copy link
Member Author

dashed commented Mar 30, 2021

/gcbrun

@dashed dashed merged commit c4b9669 into master Mar 30, 2021
@dashed dashed deleted the breakdowns-schema branch March 30, 2021 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants