Skip to content

Guard against invalid jackknife block sizing when n_jack exceeds site count#911

Merged
jonbrenas merged 3 commits intomalariagen:masterfrom
adilraza99:GH907-n-jack-validation
Feb 28, 2026
Merged

Guard against invalid jackknife block sizing when n_jack exceeds site count#911
jonbrenas merged 3 commits intomalariagen:masterfrom
adilraza99:GH907-n-jack-validation

Conversation

@adilraza99
Copy link
Copy Markdown
Contributor

Summary

This PR adds validation to ensure that n_jack does not exceed the number
of available variant sites when computing jackknife block lengths.

When n_jack > n_sites, integer division produces a block length of 0,
which leads to downstream failures inside blockwise computations
(e.g. scikit-allel) and results in unclear error messages.

This situation can arise when analysing small genomic regions or heavily
filtered datasets while using the default n_jack=200.


Problem

Both implementations compute block length as:

block_length = n_sites // n_jack

Closes #907

@adilraza99 adilraza99 force-pushed the GH907-n-jack-validation branch from 5ebc25b to 1bb37c7 Compare February 23, 2026 00:44
@adilraza99
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas ,

Following the discussion in the issue #907 (comment), I looked into the edge cases where the number of sites is small relative to the requested jackknife blocks.

The update makes sure block sizing stays valid in these situations, avoiding zero-length blocks and the instability that can follow, while keeping the existing statistical behaviour unchanged.

I also checked nearby code paths to keep handling consistent for small regions and added safeguards where needed.

Please let me know if this is in line with what you had in mind, or if you’d like me to adjust anything.

@adilraza99
Copy link
Copy Markdown
Contributor Author

hii @jonbrenas, whenever you have time, could you please take a look at this?

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @adilraza99. Unless I missed something, the repetitions might still end up with a failed test if a region that is too small is repeatedly selected.

@adilraza99
Copy link
Copy Markdown
Contributor Author

@jonbrenas thanks for flagging this.

You're right - if a very small region is selected, it could still end up failing. I can tighten the guard to make sure that case is handled more cleanly.

Do you think it would be better to raise a clear error for very small regions, or adjust the jackknife parameters automatically?

@adilraza99
Copy link
Copy Markdown
Contributor Author

@jonbrenas I want a suggestion from you on this.

@jonbrenas
Copy link
Copy Markdown
Collaborator

random_region_str has a 'region_size' parameter which should work to force a big enough region.

@adilraza99
Copy link
Copy Markdown
Contributor Author

@jonbrenas thanks - that makes sense.

I’ll use the region_size parameter in random_region_str to ensure
sufficiently large regions and keep the behaviour deterministic.

I’ll also clean up the commits before finalizing.

@adilraza99 adilraza99 force-pushed the GH907-n-jack-validation branch from b951b2c to c5b53f9 Compare February 27, 2026 14:21
@adilraza99 adilraza99 force-pushed the GH907-n-jack-validation branch from c5b53f9 to 7285f4c Compare February 27, 2026 14:30
@adilraza99
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas,

I’ve simplified the guard to ensure we fail early when the number of sites is insufficient for the requested jackknife blocks, preventing the zero block-length case while keeping the behaviour minimal and consistent with the existing implementation.

I also added a small test covering this scenario.

Please let me know if you’d prefer any adjustment.

@adilraza99
Copy link
Copy Markdown
Contributor Author

Thanks for adding the new checks. Everything is passing on my side.

@jonbrenas jonbrenas merged commit 7b9fcaf into malariagen:master Feb 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

n_jack validation missing when number of sites is smaller than jackknife blocks

2 participants