Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong timezone used for reading timestamp from ORC in Hive connector #11416

Open
dmichal opened this issue Sep 4, 2018 · 1 comment
Open

Comments

@dmichal
Copy link

dmichal commented Sep 4, 2018

Hello,
I've got the following problem: I have a HIVE ORC table with columns: time (timestamp), id (bigint). After setting the Hive server's timezone to 'America/New_York' I inserted into the table rows:

time id
2018-01-01 08:13:41.412 1
2018-06-01 08:13:41.412 2

Afterwards, I changed the Hive server's timezone to 'UTC' and inserted:

time id
2018-01-01 08:13:41.412 3
2018-06-01 08:13:41.412 4

As a result I obtained a table stored in two ORC files whose stripe footers' timezones were set to 'America/New_York' and 'UTC' respectively. Querying the table from beeline yields correct results:

time id
2018-01-01 08:13:41.412 1
2018-06-01 08:13:41.412 2
2018-01-01 08:13:41.412 3
2018-06-01 08:13:41.412 4

However, when querying the table from presto client I obtain the results:

  1. For 'hive.time-zone=UTC' (and timestamp converted to 'UTC'):
time id
2018-01-01 08:13:41.412 1
2018-06-01 07:13:41.412 2
2018-01-01 08:13:41.412 3
2018-06-01 08:13:41.412 4
  1. For 'hive.time-zone=America/New_York' (and timestamp converted to 'America/New_York'):
time id
2018-01-01 08:13:41.412 1
2018-06-01 08:13:41.412 2
2018-01-01 08:13:41.412 3
2018-06-01 09:13:41.412 4

Since timestamp in ORC is stored as a number of seconds since 01.01.2015, I suppose that the Hive connector uses timezone specified in 'hive.time-zone' in order to convert seconds to timestamp instead of using stripe footer's timezone. The problem appears when one of the timezones has DST and another does not. Then there is a one hour difference in summer timestamps.
If I understand this correctly, Hive connector should use ORC stripe footer's timezone to determine what the timestamp is (i.e. to calculate it from seconds stored in file) and 'hive.time-zone' to determine what timezone does this timestamp correspond to.

@shenavaa
Copy link

This seems to be fixed in PrestoSQL:

trinodb/trino#2099
trinodb/trino#212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants