Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3953 +/- ##
==========================================
+ Coverage 77.96% 78.31% +0.35%
==========================================
Files 118 119 +1
Lines 12517 12749 +232
==========================================
+ Hits 9759 9985 +226
- Misses 2758 2764 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
flying-sheep
left a comment
There was a problem hiding this comment.
Looks good! I fixed the docs and made the code a bit nicer, but the reference tests have been failing from the start. Any idea why? You said they work locally for you?
|
so 0.05 for L2 was way to low harmony-pytorch is at 0.0511. we were at 0.0501. I also ran harmoypy once and it was at 0.063. so I set L2 to 0.1 |
flying-sheep
left a comment
There was a problem hiding this comment.
looks good! Some API considerations.
Please
- add parameter descriptions and
- rename the parameters (probably a good idea, this is how we do things in scanpy, but we might have to add a wrapper in .external for this)
- add a towncrier fragment
| basis: str = "X_pca", | ||
| adjusted_basis: str = "X_pca_harmony", |
There was a problem hiding this comment.
these will become refs in 2.0, but we might release harmony before that.
| theta: float | Sequence[float] | None = None, | ||
| sigma: float = 0.1, | ||
| n_clusters: int | None = None, | ||
| max_iter_harmony: int = 10, | ||
| max_iter_clustering: int = 200, | ||
| tol_harmony: float = 1e-4, | ||
| tol_clustering: float = 1e-5, | ||
| ridge_lambda: float = 1.0, | ||
| correction_method: Literal["fast", "original"] = "original", | ||
| block_proportion: float = 0.05, | ||
| tau: int = 0, |
There was a problem hiding this comment.
theta, lambda, sigma, tau … we should probably
- rename these to make sense
- fix the docs:
tau“Discounting factor” – this one has a great description about what it does, let’s do that for the rest! e.g.- “Diversity penalty weight … default 2 for each batch variable” – “diversity” might be a good component of the param name, but what is it and what does “2” mean compared to smaller or bigger values?
- “ridge regression regularization … default 1” – what does increasing and decreasing do, respectively?
| correction_method: Literal["fast", "original"] = "original", | ||
| block_proportion: float = 0.05, | ||
| tau: int = 0, | ||
| random_state: int | None = 0, |
There was a problem hiding this comment.
will change in #3983 (very soon, so we should probably wait for that PR)
| ) | ||
| for f, (name, hash_) in DATA.items() | ||
| } | ||
| dfs = {f: pd.read_csv(path, delimiter="\t") for f, path in paths.items()} |
There was a problem hiding this comment.
CSV ugh. maybe do the adata…_session + adata… pattern so we copy it in memory instead of reading it from disk multiple times?
There was a problem hiding this comment.
shouldn’t this file go into ./_harmony/?
After recent changes from Harmonypy its necessary to move HarmonyBatchcorrection to scanpy. This function is based on harmony-pytorch and rapids-singlecell