-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13168: [C++][R] Enable runtime timezone database for Windows #12536
Changes from all commits
7c8cbc2
d92b1ab
bf82303
be6b344
a1be197
89ce1bf
191885c
b58d102
05d1dee
48c5584
647e3ac
22fc930
1c609aa
64af70c
410c2e5
5a33d30
ab5d038
668c8cc
32df619
7ebf6a8
42f0a6a
baa2263
7c09080
d883a48
60ccf65
ece8ce0
7da26f6
2471139
843cdb6
29dc875
c78d3cd
fe6469d
9f97016
ad161ae
62e8178
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
set -ex | ||
|
||
# Download database | ||
curl https://data.iana.org/time-zones/releases/tzdata2021e.tar.gz --output ~/Downloads/tzdata2021e.tar.gz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, they just released 2022a a few days ago |
||
|
||
# Extract | ||
mkdir -p ~/Downloads/tzdata | ||
tar --extract --file ~/Downloads/tzdata2021e.tar.gz --directory ~/Downloads/tzdata | ||
|
||
# Download Windows timezone mapping | ||
curl https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/windowsZones.xml --output ~/Downloads/tzdata/windowsZones.xml |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
#include "arrow/compute/kernels/test_util.h" | ||
#include "arrow/testing/gtest_util.h" | ||
#include "arrow/testing/matchers.h" | ||
#include "arrow/testing/util.h" | ||
#include "arrow/type.h" | ||
#include "arrow/util/checked_cast.h" | ||
#include "arrow/util/formatting.h" | ||
|
@@ -407,6 +408,14 @@ class ScalarTemporalTest : public ::testing::Test { | |
RoundTemporalOptions round_to_15_quarters = | ||
RoundTemporalOptions(15, CalendarUnit::QUARTER); | ||
RoundTemporalOptions round_to_15_years = RoundTemporalOptions(15, CalendarUnit::YEAR); | ||
|
||
protected: | ||
void SetUp() override { | ||
#ifdef _WIN32 | ||
// Initialize timezone database on Windows | ||
ASSERT_OK(InitTestTimezoneDatabase()); | ||
#endif | ||
} | ||
}; | ||
|
||
TEST_F(ScalarTemporalTest, TestTemporalComponentExtractionAllTemporalTypes) { | ||
|
@@ -564,8 +573,6 @@ TEST_F(ScalarTemporalTest, TestOutsideNanosecondRange) { | |
CheckScalarUnary("subsecond", unit, times, float64(), subsecond); | ||
} | ||
|
||
#ifndef _WIN32 | ||
// TODO: We should test on windows once ARROW-13168 is resolved. | ||
TEST_F(ScalarTemporalTest, TestIsLeapYear) { | ||
auto is_leap_year_marquesas = | ||
"[false, true, false, false, false, false, false, false, false, false, false, " | ||
|
@@ -792,7 +799,6 @@ TEST_F(ScalarTemporalTest, TestNonexistentTimezone) { | |
ASSERT_RAISES(Invalid, Subsecond(timestamp_array)); | ||
} | ||
} | ||
#endif | ||
|
||
TEST_F(ScalarTemporalTest, Week) { | ||
auto unit = timestamp(TimeUnit::NANO); | ||
|
@@ -1607,8 +1613,6 @@ TEST_F(ScalarTemporalTest, TestTemporalDifferenceErrors) { | |
CallFunction("weeks_between", {arr1, arr1}, &options)); | ||
} | ||
|
||
// TODO: We should test on windows once ARROW-13168 is resolved. | ||
#ifndef _WIN32 | ||
TEST_F(ScalarTemporalTest, TestAssumeTimezone) { | ||
std::string timezone_utc = "UTC"; | ||
std::string timezone_kolkata = "Asia/Kolkata"; | ||
|
@@ -1875,6 +1879,9 @@ TEST_F(ScalarTemporalTest, StrftimeCLocale) { | |
} | ||
|
||
TEST_F(ScalarTemporalTest, StrftimeOtherLocale) { | ||
#ifdef _WIN32 | ||
GTEST_SKIP() << "There is a known bug in strftime for locales on Windows (ARROW-15922)"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this on Windows or specifically MinGW? i.e., would the non-MinGW Windows CI pass if you remove this skip? I'm wondering if we can narrow the condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If failed on Windows 2019 C++17, which uses MSVC. But seems like it was actually passing on MinGW. So maybe I can try skipping for just MSVC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a bit weird, MinGW is just a different compiler but targetting the same runtime libraries... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think the test is likely skipping on the |
||
#else | ||
if (!LocaleExists("fr_FR.UTF-8")) { | ||
GTEST_SKIP() << "locale 'fr_FR.UTF-8' doesn't exist on this system"; | ||
} | ||
|
@@ -1886,6 +1893,7 @@ TEST_F(ScalarTemporalTest, StrftimeOtherLocale) { | |
["01 janvier 1970 00:00:59,123", "18 août 2021 15:11:50,456", null])"; | ||
CheckScalarUnary("strftime", timestamp(TimeUnit::MILLI, "UTC"), milliseconds, utf8(), | ||
expected, &options); | ||
#endif | ||
} | ||
|
||
TEST_F(ScalarTemporalTest, StrftimeInvalidLocale) { | ||
|
@@ -2579,7 +2587,6 @@ TEST_F(ScalarTemporalTest, TestCeilFloorRoundTemporalKolkata) { | |
CheckScalarUnary("round_temporal", unit, times, unit, round_1_hours, &round_to_1_hours); | ||
CheckScalarUnary("round_temporal", unit, times, unit, round_2_hours, &round_to_2_hours); | ||
} | ||
#endif // !_WIN32 | ||
|
||
} // namespace compute | ||
} // namespace arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be the same DB that R and Python will be using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that
2021e
is a version, which won't necessarily align with the R and Python ones in the future. I don't think it matters that we update it, unless the format changes somehow. This is just for testing that it works, and we don't ship it.But the R unit tests use the one provided by the tzdb package, so we are testing that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently US and EU have DST, but they plan to abolish it soon. At that point we will have Python and R DBs with fresh DSTless times and arrow c++ using DST for tests. In isolation that's ok, but we do have tests comparing pandas and pyarrow results and similar for lubridate. It's not a big problem for sure, but if there is like a tzdata-latest.tar.gz that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The times in the past won't change, so unless we test against a random "now", updates to the timezone database (such as for DST policy changes) shouldn't impact those tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this should be fine as is. We will never be testing between timezone databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently test arrow vs pandas tzdb in CI. We have a test for a timestamp in 2033 and if it is in DST and DST is abolished it will error if we'll be using a pre-abolishment db with arrow and post-abolishment db with pandas. This is super irrelevant and as it's a simple fix and I'm sorry for wasting your time :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. We'll address in the follow up PR where we'll have PyArrow use tzdb. For now the python timestamp tests are skipped on Windows
arrow/python/pyarrow/tests/test_compute.py
Lines 1915 to 1917 in d327f69