-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Support for reading log compactions #2073
Conversation
case DeltaFile(f, fileVersion) => | ||
(f, FileType.DELTA, fileVersion) | ||
case CompactedDeltaFile(f, startVersion, endVersion) | ||
if includeMinorCompactions && versionToLoad.forall(endVersion <= _) => | ||
(f, FileType.COMPACTED_DELTA, startVersion) | ||
case CheckpointFile(f, fileVersion) if f.getLen > 0 => | ||
(f, FileType.CHECKPOINT, fileVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any use of the middle element of this triple? The version is used for takeWhile
and the file status is what we actually return... but where is file type used?
(version, version) | ||
} | ||
|
||
if (highestVersionSeen < startVersion && endVersion <= latestCommitVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures we don't pick up overlapping compacted Deltas, right?
- First compaction can't straddle the starting version
- Neighboring compactions can't overlap
- Last compaction can't straddle latest version
We get a lot of work out of that one line, maybe worth a code comment?
if (Utils.isTesting) { | ||
assert(false, s"Validation around Compacted deltas failed while creating Snapshot. " + | ||
s"[${JsonUtils.toJson(eventData)}]") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this code do? It seems to fail unconditionally in testing, but it's not obvious why validation has failed at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding comment before the val eventData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This PR adds read support for log compactions described here: delta-io#2072 Closes delta-io#2073 GitOrigin-RevId: 6f4a09c3fa09c303cdeb747c382cedcfda5a2a4c
This PR adds read support for log compactions described here: delta-io#2072 Closes delta-io#2073 GitOrigin-RevId: 6f4a09c3fa09c303cdeb747c382cedcfda5a2a4c
This PR adds read support for log compactions described here: delta-io#2072 Closes delta-io#2073 GitOrigin-RevId: 6f4a09c3fa09c303cdeb747c382cedcfda5a2a4c
Which Delta project/connector is this regarding?
Description
This PR adds read support for log compactions described here: #2072
How was this patch tested?
UTs
Does this PR introduce any user-facing changes?
No