-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Add Delta Connect Server Library #3136
[Spark] Add Delta Connect Server Library #3136
Conversation
2a878fe
to
ed9382b
Compare
@longvu-db do you have more details on what is Delta Connect? |
706624d
to
f988777
Compare
Hey @felipepessoto! Delta Connect adds Spark Connect support to Scala and Python APIs of Delta Lake for Apache Spark. Spark Connect is a new project released in Apache Spark 4.0 that adds a decoupled client-server infrastructure which allows remote connectivity from Spark from everywhere. Delta Connect makes the DeltaTable interfaces compatible with the new Spark Connect protocol. There are some issues regarding Delta Connect: |
spark-connect/server/src/main/scala-spark-master/delta/connect/DeltaRelationPlugin.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments nits mostly.
spark-connect/server/src/main/scala-spark-master/delta/connect/DeltaRelationPlugin.scala
Outdated
Show resolved
Hide resolved
spark-connect/server/src/test/scala-spark-master/delta/connect/DeltaConnectPlannerSuite.scala
Outdated
Show resolved
Hide resolved
spark-connect/common/src/main/scala-spark-master/delta/connect/ImplicitProtoConversions.scala
Outdated
Show resolved
Hide resolved
libraryDependencies ++= Seq( | ||
"com.google.protobuf" % "protobuf-java" % protoVersion % "protobuf", | ||
|
||
"org.apache.spark" %% "spark-hive" % sparkVersion.value % "provided", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the spark version here needs to be the spark master. Does sparkVersion.value
return the spark master version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we specify the sparkVersion
to be master (build/sbt -DsparkVersion=master
) then it will only compile files with prefix src/main/scala-spark-master
(https://github.com/delta-io/delta/blob/master/build.sbt#L163).
I'm not sure if there is a way to throw an error if we compile Spark Connect by mistake with an older Spark Version, I guess most softwares if we compile with the wrong jdk or python if we compile with the wrong dependencies then it will not tell us explicit but just simply fail with an error log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @scottsand-db help on build.sbt
we managed to make the project only compiles, tests and publishes only against Spark Master now, and it does nothing for Spark Latest (other than saying that it is skipping for Delta Connect projects).
I ran the build.sbt -DsparkVersion={latest, master} {connectServer, connectCommon}/{compile, test, publish}
and build.sbt {compile/test/publish}
manually to verify, but I'm not sure if there is a way to automatically test it for Github Workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
libraryDependencies ++= Seq( | ||
"com.google.protobuf" % "protobuf-java" % protoVersion % "protobuf", | ||
|
||
"org.apache.spark" %% "spark-hive" % sparkVersion.value % "provided", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thank you.
@@ -48,7 +48,6 @@ def get_args(): | |||
|
|||
def run_sbt_tests(root_dir, test_group, coverage, scala_version=None): | |||
print("##### Running SBT tests #####") | |||
is_running_spark_tests = test_group is None or test_group == "spark" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkorukanti Seems like is_running_spark_tests
is actually not used anywhere? So I'm removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM
) ## Description [Delta Connect Server Lib PR](#3136) but merge into `branch-4.0-preview1` instead of `master`. ## How was this patch tested? Added UTs.
…3200) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description [Delta Connect Scala Client](#3136) but merge into `branch-4.0-preview1` instead of `master`. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> ## How was this patch tested? Added UTs. <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: Shawn Chang <[email protected]> Co-authored-by: Scott Sandre <[email protected]> Co-authored-by: Tom van Bussel <[email protected]> Co-authored-by: Jiaheng Tang <[email protected]> Co-authored-by: Sumeet Varma <[email protected]> Co-authored-by: Venki Korukanti <[email protected]> Co-authored-by: Avril Aysha <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Hao Jiang <[email protected]>
…3201) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> [Delta Connect Python Client PR](#3136) but merge into branch-4.0-preview1 instead of master. ## How was this patch tested? Added UTs. <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> ## Does this PR introduce _any_ user-facing changes? No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: Tai Le Manh <[email protected]> Signed-off-by: Shawn Chang <[email protected]> Co-authored-by: Scott Sandre <[email protected]> Co-authored-by: Tai Le Manh <[email protected]> Co-authored-by: Johan Lasperas <[email protected]> Co-authored-by: Allison Portis <[email protected]> Co-authored-by: Tom van Bussel <[email protected]> Co-authored-by: Jiaheng Tang <[email protected]> Co-authored-by: Sumeet Varma <[email protected]> Co-authored-by: Venki Korukanti <[email protected]> Co-authored-by: Avril Aysha <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Hao Jiang <[email protected]>
…elta-io#3201) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> [Delta Connect Python Client PR](delta-io#3136) but merge into branch-4.0-preview1 instead of master. Added UTs. <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: Tai Le Manh <[email protected]> Signed-off-by: Shawn Chang <[email protected]> Co-authored-by: Scott Sandre <[email protected]> Co-authored-by: Tai Le Manh <[email protected]> Co-authored-by: Johan Lasperas <[email protected]> Co-authored-by: Allison Portis <[email protected]> Co-authored-by: Tom van Bussel <[email protected]> Co-authored-by: Jiaheng Tang <[email protected]> Co-authored-by: Sumeet Varma <[email protected]> Co-authored-by: Venki Korukanti <[email protected]> Co-authored-by: Avril Aysha <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Shawn Chang <[email protected]> Co-authored-by: Hao Jiang <[email protected]>
Which Delta project/connector is this regarding?
Description
This PR adds a skeleton for Delta Connect server library, and we add support for
Scan
to Delta's planner plugin.How was this patch tested?
Added some basic tests for SparkConnectPlanner using the
DeltaRelationPlugin
andDeltaCommandPlugin
plugins.Does this PR introduce any user-facing changes?
No.