Guard against invalid jackknife block sizing when n_jack exceeds site count#911
Conversation
5ebc25b to
1bb37c7
Compare
|
Hi @jonbrenas , Following the discussion in the issue #907 (comment), I looked into the edge cases where the number of sites is small relative to the requested jackknife blocks. The update makes sure block sizing stays valid in these situations, avoiding zero-length blocks and the instability that can follow, while keeping the existing statistical behaviour unchanged. I also checked nearby code paths to keep handling consistent for small regions and added safeguards where needed. Please let me know if this is in line with what you had in mind, or if you’d like me to adjust anything. |
|
hii @jonbrenas, whenever you have time, could you please take a look at this? |
|
Thanks @adilraza99. Unless I missed something, the repetitions might still end up with a failed test if a region that is too small is repeatedly selected. |
|
@jonbrenas thanks for flagging this. You're right - if a very small region is selected, it could still end up failing. I can tighten the guard to make sure that case is handled more cleanly. Do you think it would be better to raise a clear error for very small regions, or adjust the jackknife parameters automatically? |
|
@jonbrenas I want a suggestion from you on this. |
|
|
|
@jonbrenas thanks - that makes sense. I’ll use the I’ll also clean up the commits before finalizing. |
b951b2c to
c5b53f9
Compare
c5b53f9 to
7285f4c
Compare
|
Hi @jonbrenas, I’ve simplified the guard to ensure we fail early when the number of sites is insufficient for the requested jackknife blocks, preventing the zero block-length case while keeping the behaviour minimal and consistent with the existing implementation. I also added a small test covering this scenario. Please let me know if you’d prefer any adjustment. |
|
Thanks for adding the new checks. Everything is passing on my side. |
Summary
This PR adds validation to ensure that
n_jackdoes not exceed the numberof available variant sites when computing jackknife block lengths.
When
n_jack > n_sites, integer division produces a block length of 0,which leads to downstream failures inside blockwise computations
(e.g. scikit-allel) and results in unclear error messages.
This situation can arise when analysing small genomic regions or heavily
filtered datasets while using the default
n_jack=200.Problem
Both implementations compute block length as:
Closes #907