-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KYUUBI #6017] Support observe hint #6017
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6017 +/- ##
============================================
- Coverage 61.16% 61.14% -0.03%
Complexity 23 23
============================================
Files 623 623
Lines 37060 37060
Branches 5024 5024
============================================
- Hits 22669 22661 -8
- Misses 11956 11961 +5
- Partials 2435 2438 +3 ☔ View full report in Codecov by Sentry. |
how can end users get the metrics after adding hint ? Is there a pure sql way to show the metrics ? |
I would have preferred to display it in the SQL Tab of the Spark UI, but that would require changes to spark code. like: For Kyuubi, I want to record metrics in Spark Event first, and then display them in Kyuubi Statement Tab. What do you think? |
How about introducing a new syntax instead of hint, something like |
This seems to only observe the final result, I want to insert observers at the all stages of SQL like in test case. original sql:
sql with observers:
|
The behavior is decided by yourself. The new syntax also supports observe each operator as long as you inject CollectMetrics for each operator. |
How do we push down the specified aggregation expression to the previous stage?
Like this SQL, how do we apply |
Maybe we can make the syntax support nested statement. e.g.,
The difference with hint is that, we can change it's output. |
I might prefer to split it into two tasks:
like:
|
I'm fine with it. It should be a kind of pure sql for |
Yes, |
🔍 Description
Issue References 🔗
This pull request fixes #
Describe Your Solution 🔧
Provide OBSERVE Hint to create an observer to collect aggregated metrics.
The OBSERVE Hint Syntax:
Usage like:
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Related Unit Tests
org.apache.spark.sql.observe.ResolveObserveHintsSuite
Checklist 📝
Be nice. Be informative.