-
Notifications
You must be signed in to change notification settings - Fork 1
Spark35_glue5_upgrade #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… use broadcast join on smaller df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, however I don't fully understand broadcasts, so it would be helpful to discuss those to ensure I've not missed anything.
.github/workflows/ci-checks.yaml
Outdated
@@ -8,20 +8,59 @@ permissions: | |||
contents: read | |||
|
|||
jobs: | |||
python37-test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth maintaining backwards compatibility testing for python 3.7, or could we drop 3.7 support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Spark 3.5.x officially supports Python 3.8 and later. Support for Python 3.7 was deprecated in Spark 3.4.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However pyproject.toml has the pyspark dependency versions ranging '>=3.1.1 <3.5.3' and spark 3.1.1 supported python version - https://archive.apache.org/dist/spark/docs/3.1.1/ . so at-least we need to support python3.7.
comment from ID This should be kept as a draft for now IMO…we can’t merge into main until MDQ have confirmed they have resource to test and we can deploy and test as part of a release. |
SPP-12455
Update the imputation's engine and ratio_calculator
Synopsis
Upgrading to use glue 5, spark 3.5.2, and python 3.11 for glue jobs. AWS
Checklist
Description
Add a more detailed description of the pr if necessary (can reference release
notes if included).