Skip to content

Use CLOCK_MONOTONIC for background thread sleep to prevent clock rollback stalls#4

Merged
antonio2368 merged 3 commits intodevfrom
background-thread-monotonic-clock
Apr 2, 2026
Merged

Use CLOCK_MONOTONIC for background thread sleep to prevent clock rollback stalls#4
antonio2368 merged 3 commits intodevfrom
background-thread-monotonic-clock

Conversation

@antonio2368
Copy link
Copy Markdown
Member

background_thread_sleep() computes the pthread_cond_timedwait deadline from gettimeofday() (CLOCK_REALTIME). If the wall clock jumps backward (NTP correction, VM
snapshot, leap second), the deadline ends up far in the future and the background thread stops purging dirty pages — potentially for hours or days.

This was observed in production where NTP corrections caused unbounded RSS growth due to the background thread stalling.

Fix

  • Switch deadline computation to clock_gettime(CLOCK_MONOTONIC)
  • Set CLOCK_MONOTONIC on the condvar via pthread_condattr_setclock at both init sites (background_thread_boot1 and background_thread_postfork_child)
  • Platforms without CLOCK_MONOTONIC retain the original gettimeofday fallback

When CLOCK_MONOTONIC is available, use it for measuring sleep duration
and initializing condvars instead of gettimeofday. This matches the
condvar clock attribute set during init and avoids issues with wall
clock adjustments (e.g. NTP corrections) affecting sleep calculations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@azat
Copy link
Copy Markdown
Member

azat commented Apr 1, 2026

Good that we have CI for jemalloc!

@azat azat self-assigned this Apr 1, 2026
@azat
Copy link
Copy Markdown
Member

azat commented Apr 2, 2026

Let's also send patch to upstream (whatever upstream repo is)

…failure

Replace duplicated clock_gettime/gettimeofday blocks with nstime_init_update()
which already handles the monotonic vs realtime fallback. Add error handling
for pthread_condattr_setclock: in postfork_child, fall back to default
CLOCK_REALTIME attributes; in boot1, propagate the failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonio2368 antonio2368 requested a review from azat April 2, 2026 07:50
In postfork_child, fall back to default condvar (CLOCK_REALTIME) if
condattr_init fails. In boot1, return failure since initialization
cannot proceed without a properly configured condvar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonio2368 antonio2368 merged commit b0f2213 into dev Apr 2, 2026
153 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants