You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I've got the following problem: I have a HIVE ORC table with columns: time (timestamp), id (bigint). After setting the Hive server's timezone to 'America/New_York' I inserted into the table rows:
time
id
2018-01-01 08:13:41.412
1
2018-06-01 08:13:41.412
2
Afterwards, I changed the Hive server's timezone to 'UTC' and inserted:
time
id
2018-01-01 08:13:41.412
3
2018-06-01 08:13:41.412
4
As a result I obtained a table stored in two ORC files whose stripe footers' timezones were set to 'America/New_York' and 'UTC' respectively. Querying the table from beeline yields correct results:
time
id
2018-01-01 08:13:41.412
1
2018-06-01 08:13:41.412
2
2018-01-01 08:13:41.412
3
2018-06-01 08:13:41.412
4
However, when querying the table from presto client I obtain the results:
For 'hive.time-zone=UTC' (and timestamp converted to 'UTC'):
time
id
2018-01-01 08:13:41.412
1
2018-06-01 07:13:41.412
2
2018-01-01 08:13:41.412
3
2018-06-01 08:13:41.412
4
For 'hive.time-zone=America/New_York' (and timestamp converted to 'America/New_York'):
time
id
2018-01-01 08:13:41.412
1
2018-06-01 08:13:41.412
2
2018-01-01 08:13:41.412
3
2018-06-01 09:13:41.412
4
Since timestamp in ORC is stored as a number of seconds since 01.01.2015, I suppose that the Hive connector uses timezone specified in 'hive.time-zone' in order to convert seconds to timestamp instead of using stripe footer's timezone. The problem appears when one of the timezones has DST and another does not. Then there is a one hour difference in summer timestamps.
If I understand this correctly, Hive connector should use ORC stripe footer's timezone to determine what the timestamp is (i.e. to calculate it from seconds stored in file) and 'hive.time-zone' to determine what timezone does this timestamp correspond to.
The text was updated successfully, but these errors were encountered:
Hello,
I've got the following problem: I have a HIVE ORC table with columns: time (timestamp), id (bigint). After setting the Hive server's timezone to 'America/New_York' I inserted into the table rows:
Afterwards, I changed the Hive server's timezone to 'UTC' and inserted:
As a result I obtained a table stored in two ORC files whose stripe footers' timezones were set to 'America/New_York' and 'UTC' respectively. Querying the table from beeline yields correct results:
However, when querying the table from presto client I obtain the results:
Since timestamp in ORC is stored as a number of seconds since 01.01.2015, I suppose that the Hive connector uses timezone specified in 'hive.time-zone' in order to convert seconds to timestamp instead of using stripe footer's timezone. The problem appears when one of the timezones has DST and another does not. Then there is a one hour difference in summer timestamps.
If I understand this correctly, Hive connector should use ORC stripe footer's timezone to determine what the timestamp is (i.e. to calculate it from seconds stored in file) and 'hive.time-zone' to determine what timezone does this timestamp correspond to.
The text was updated successfully, but these errors were encountered: