Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for delta timestamp_ntz datatype #24418

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

infvg
Copy link
Contributor

@infvg infvg commented Jan 23, 2025

Description

Add support for TIMESTAMP_NTZ data type in v3 delta tables.
Currently, only TIMESTAMP column types are supported in presto.

Applied current presto logic for timestamp types. Although this issue still persists:
trinodb/trino#37

Timestamp type will be modified by timezone despite this functionality differing in Spark if legacy timestamp is set to true.

Impact

Will allow presto to read tables with TIMESTAMP_NTZ columns
Will change presto delta table columns of TIMESTAMP type to TIMESTAMP_WITH_TIME_ZONE

Test Plan

Added UT

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==
Delta Connector Changes
* Improve mapping of ``TIMESTAMP`` column type by changing it from Presto  ``TIMESTAMP`` type to ``TIMESTAMP_WITH_TIME_ZONE``.
* Add support for ``TIMESTAMP_NTZ`` column type as Presto ``TIMESTAMP`` type.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 23, 2025
@infvg infvg marked this pull request as ready for review January 23, 2025 11:18
@infvg infvg requested a review from a team as a code owner January 23, 2025 11:18
@infvg infvg requested a review from presto-oss January 23, 2025 11:18
@steveburnett
Copy link
Contributor

steveburnett commented Jan 23, 2025

Thanks for the release note! Nits suggested:
(edited, see newer comment below)

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the commit message, PR title/description and Release note to timestampNTZ rather than timezoneNTZ.

@infvg infvg changed the title Add support for delta timezone_ntz datatype Add support for delta timestamp_ntz datatype Jan 27, 2025
@infvg infvg force-pushed the delta-timentz-support branch from c6371a6 to fb5fef3 Compare January 27, 2025 09:03
@infvg infvg force-pushed the delta-timentz-support branch from fb5fef3 to bd1117e Compare January 29, 2025 07:43
@infvg infvg force-pushed the delta-timentz-support branch from bd1117e to 2d81166 Compare February 13, 2025 10:20
@infvg infvg requested a review from shangxinli as a code owner February 13, 2025 10:20
@steveburnett
Copy link
Contributor

Thanks for the release note! Some changes to follow the Release Notes Guidelines:

== RELEASE NOTES ==

Delta Lake Connector Changes
* Improve mapping of ``TIMESTAMP`` column type by changing it from Presto  ``TIMESTAMP`` type to ``TIMESTAMP_WITH_TIME_ZONE``.
* Add support for ``TIMESTAMP_NTZ`` column type as Presto ``TIMESTAMP`` type.

@infvg infvg requested a review from auden-woolfson February 13, 2025 17:38
Copy link
Contributor

@auden-woolfson auden-woolfson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, please update the test

@infvg infvg force-pushed the delta-timentz-support branch 2 times, most recently from 5da9113 to fe7d66e Compare February 17, 2025 10:10
@infvg infvg requested a review from auden-woolfson February 17, 2025 10:22
@infvg infvg force-pushed the delta-timentz-support branch 3 times, most recently from 7859b12 to 3ece8ef Compare February 17, 2025 15:10
@infvg infvg force-pushed the delta-timentz-support branch from 3ece8ef to 15f8db5 Compare February 17, 2025 15:15
Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infvg thank you for the fix. I am still in the process of understanding and verifying its correctness. I have a couple of comments, and I will add more if necessary.

Have you verified compatibility with Spark? I think it would be beneficial to ensure that Presto's handling of timestamps for delta tables aligns with Spark's behavior.

Additionally, have we included tests for both partitioned and non-partitioned tables?

targetType.writeLong(newBlockBuilder, longArrayBlock.getLong(position, 0));
// If the first 12 bits of a timestamp with timezone are all 0, then only the unix timestamp is encoded
// in the long & we can set the last 12 bits as 0 to default to UTC.
if ((longArrayBlock.getLong(position) & 0xFFFF000000000000L) == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Presto only uses the first 12 bits for the time zone key, shouldn't we use a 12-bit mask here? 0xFFFF000000000000L is a 16-bit mask.

Copy link
Contributor Author

@infvg infvg Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unix timestamps are max 32 bits, which gets pushed 12 bits. If no timezone is encoded, then the first 32 bits will contain the timestamp & the last 32 bits will be empty. So if the first 32 bits are empty then no timezone was encoded & I push it it 12 bits.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate the PR into two commits?

  1. Add support for the timestamp_ntz datatype in the Delta Lake Connector
  2. Fix the handling of timestamps with time zones in the Delta Lake Connector

Please feel free to modify the commit messages as you see fit.

Previously, TimestampNTZ was not a supported column type. This PR will map the delta TimestampNTZ type
to Presto's timestamp type
@infvg infvg force-pushed the delta-timentz-support branch from 15f8db5 to 879df1c Compare March 4, 2025 11:42
@infvg
Copy link
Contributor Author

infvg commented Mar 4, 2025

@imjalpreet
I've tested the functionality of delta with pyspark & it matches presto as long as the legacy_timestamp value is set to false. The delta timestamp type is similar to legacy timestamp type (since it accounts for timezones), and the timestamp with timezone type (since it has timezones). TimestampNTZ is the same as non legacy timestamp type.

The readPartitionedTableAllDataTypes tests the Timestamp type in a partitioned table & testDeltaTimezoneTypeSupport tests it in a non partitioned table.

@infvg infvg force-pushed the delta-timentz-support branch 3 times, most recently from b99313b to 97cbacc Compare March 4, 2025 14:27
Delta table column type Timestamp type should be mapped to TIMESTAMP_WITH_TIMEZONE
as there is a no timezone column type counterpart.
@infvg infvg force-pushed the delta-timentz-support branch from 97cbacc to 207211b Compare March 6, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants