Skip to content

add jitter to retention schedule#5361

Open
trinity-1686a wants to merge 8 commits intomainfrom
trinity/retention-schedule-jitter
Open

add jitter to retention schedule#5361
trinity-1686a wants to merge 8 commits intomainfrom
trinity/retention-schedule-jitter

Conversation

@trinity-1686a
Copy link
Contributor

Description

add optional jitter to retention policy schedule, so that numerous indexes with the same schedule don't hammer down the metastore when they all execute their retention policy
fix #5353

How was this PR tested?

unit test

@github-actions
Copy link

github-actions bot commented Aug 28, 2024

On SSD:

Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3063, ref commit: 60368df
Link

On GCS:

Average search latency is 1.08x that of the reference (lower is better).
Ref run id: 3064, ref commit: 60368df
Link

@guilload
Copy link
Member

I wonder if we could avoid exposing users to one more knob by jittering by default. We would have to pick a value between [next evaluation..next evaluation+1], maybe biased towards the beginning of the interval. WDYT? Would that be confusing for users?

@trinity-1686a
Copy link
Contributor Author

trinity-1686a commented Sep 12, 2024

i think it would be fine for policies which happen often enough (once every day or more often), but could get confusing if policies put in more delays, especially when entered in cron format. For instance if i put 0 0 * * 0 (every sunday at midnight), I wouldn't be too surprised if the task starts a few minutes or maybe an hour late, but i would be surprised if it started doing something on tuesday (or any other day). We could also run into issues where if we do something daily (0utc assumed), and for some reason it's company policy to restart servers at 1am, only 1/24 indexes would actually be cleaned each day (the other 23/24 are scheduled after 1am, then we restart, look at what the next time to run would be, it's midnight + some delay, sleep for the rest of the day)

We could do something where the jitter_secs is min(1h, time_between_2_schedules), which sounds generally less surprising, and maybe (or maybe not) allow manual configuration to something higher than 1h?

@trinity-1686a trinity-1686a requested review from guilload and removed request for fulmicoton September 13, 2024 16:23
@guilload
Copy link
Member

Yes, min(1h, time_between_2_schedules) is perfect. I meant something along those lines with "biased".

Copy link
Member

@guilload guilload left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice one :) Let's merge it.

/// applied. Said otherwise, an operation may start any time between the next time it's
/// scheduled, and the time after that, but no later than 1h after the scheduled time.
#[serde(skip_serializing_if = "Option::is_none")]
pub jitter_secs: Option<u64>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: human time is nicer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make sure the janitor does not spam the metastore too much

2 participants