Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Dec 19, 2025

Rationale for this change

Switch to std::chrono for MSVC to be able to use the system-provided timezone automatically on Windows.

What changes are included in this PR?

This adds chrono_internal.h that uses C++20 std::chrono timezone/calendar APIs on compilers with support (MSVC only for now) and falls back to vendored date.h otherwise.

Are these changes tested?

Partially tested locally and partially to be tested on CI.

Are there any user-facing changes?

Yes, Windows users will no longer need to install the IANA tzdb (see instructions here and here). We possibly have tzdb download set up in CI too and should update it appropriately.

@github-actions
Copy link

⚠️ GitHub issue #48593 has been automatically assigned in GitHub to PR creator.

@rok rok force-pushed the cpp20_use_chrono branch 9 times, most recently from 4283740 to d82f990 Compare December 23, 2025 18:14
@rok
Copy link
Member Author

rok commented Dec 23, 2025

It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests. Meanwhile std::chrono on MSVC 19.44 ( 14.44) appears to be pass them and is correct or at least consistent with vendored date.h. I would therefore advise we only switch to std::chrono on MSVC for now as that gives us the most benefit anyway (users no longer have to deal with the tz db).

Below is the explanation and reproduction of the bug.

// GCC libstdc++ DST bug reproduction
//
// The AN to AS Transition Bug
// ---------------------------
// Source https://github.com/eggert/tz/blob/c37fbc3249c1a1334948b38f3bca47dee5c11dd1/australasia#L165-L192
// Australia/Broken_Hill used the AN (New South Wales) rules until 2000, then
// switched to AS (South Australia) rules. Under AN rules, DST started the last
// Sunday of October and ended the last Sunday of March. For the 1999-2000
// summer, DST started October 31, 1999 and would end March 26, 2000. February
// 29, 2000 falls squarely within this DST period, so the correct offset should
// be 9:30 base + 1:00 DST = 10:30 (630 minutes).
//
// Why GCC's Data is Wrong
// -----------------------
// When libstdc++ processes the zone transition from AN rules to AS rules (which
// happens in year 2000), it appears to lose or reset the DST state inherited
// from the AN rules. Instead of recognizing that DST is still active from the
// October 1999 transition, it reports offset=570 (just the 9:30 base) with
// save=0. The inconsistency is evident: it returns abbrev="ACDT" (daylight
// time) but the offset and save values indicate standard time. The AN rules
// clearly show DST should be active until the last Sunday of March 2000.
//
// Compile: g++ -std=c++20 -o gcc_dst_bug gcc_libstdcxx_dst_bug.cpp
// Expected: 630 (10:30 = 9:30 base + 1:00 DST)
// Actual:   570 (9:30 = base only, DST missing)

#include <chrono>
#include <iostream>

int main() {
  using namespace std::chrono;
  auto* tz = locate_zone("Australia/Broken_Hill");
  auto info = tz->get_info(sys_days{2000y / February / 29d} + 23h + 23min + 23s);
  std::cout << duration_cast<minutes>(info.offset).count() << "\n";
}

@rok rok marked this pull request as ready for review December 23, 2025 18:46
@rok
Copy link
Member Author

rok commented Dec 23, 2025

@pitrou

@rok rok changed the title GH-48593: [Draft][C++] C++20: use standard calendar / timezone APIs GH-48593: [C++] C++20: use standard calendar / timezone APIs Dec 23, 2025
@pitrou
Copy link
Member

pitrou commented Jan 5, 2026

It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests.

Is the bug reported somewhere? If not, can you do that?

@pitrou
Copy link
Member

pitrou commented Jan 5, 2026

We possibly have tzdb download set up in CI too and should update it appropriately.

Yes, I think we should do so. There are also a bunch of code snippets in the C++ and Python codebase that could be removed, IIRC.

@rok
Copy link
Member Author

rok commented Jan 5, 2026

Is the bug reported somewhere? If not, can you do that?

It seems to be related to a known issue, I added comment explaining our case.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Jan 5, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jan 5, 2026
@rok rok force-pushed the cpp20_use_chrono branch from 745599b to 060dd32 Compare February 9, 2026 13:33
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this is looking very good now. @raulcd Do you want to make a final pass over this?

@raulcd
Copy link
Member

raulcd commented Feb 10, 2026

@github-actions crossbow submit -g cpp

@raulcd raulcd added CI: Extra: C++ Run extra C++ CI CI: Extra: Package: Linux Run extra Linux Packages CI labels Feb 10, 2026
@github-actions
Copy link

Revision: 32fbc78

Submitted crossbow builds: ursacomputing/crossbow @ actions-1a8d03ac7f

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good overall to me. I think there's more than the gcc bug on some of these issues found, see my other review comment.
Nonetheless I am unsure how can we fight the timezone database nuances on those where they are not consistent, historical timezone changes, etcetera, where there's no agreement between different timezone databases.



# TODO(GH-48743): Re-enable when Windows timezone issues are resolved
# https://github.com/apache/arrow/issues/48743
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this comment, on that specific job failure we use:

-- The C compiler identification is MSVC 19.44.35222.0
-- The CXX compiler identification is MSVC 19.44.35222.0

not GCC. Isn't this related to differences between historical timezones depending on the timezone database used?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 10, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed CI: Extra: C++ Run extra C++ CI awaiting changes Awaiting changes labels Feb 10, 2026

# Download database
curl https://data.iana.org/time-zones/releases/tzdata2024b.tar.gz --output ~/Downloads/tzdata.tar.gz
curl https://data.iana.org/time-zones/tzdata-latest.tar.gz --output ~/Downloads/tzdata.tar.gz
Copy link
Member Author

@rok rok Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing here and elsewhere to latest.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 10, 2026
@rok
Copy link
Member Author

rok commented Feb 10, 2026

Thanks for review @raulcd! I pushed a couple of changes since. Nothing major, but still. I'll do another pass tomorrow to make sure all TODOs point to the right issues.

Comment on lines -58 to -67
# GH-45295: For ORC, try to populate TZDIR env var from tzdata package resource
# path.
#
# Note this is a different kind of database than what we allow to be set by
# `PYARROW_TZDATA_PATH` and passed to set_timezone_db_path.
if sys.platform == 'win32':
if os.environ.get('TZDIR', None) is None:
from importlib import resources
try:
os.environ['TZDIR'] = os.path.join(resources.files('tzdata'), 'zoneinfo')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a comment on the docs, you mentioned keeping the note about ORC and having to set TZDIR, but then here it is apparently not needed for our tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we don't really test for this as I believe ORC still needs TZDIR, see #48601 (review). I'll do another check tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your testing uses conda environments, then orc will pick up the tzdata from the environment: apache/orc#1882

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants