-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293
Conversation
Summary of ChangesHello @Mr-Neutr0n, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical numerical stability issue within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a division-by-zero error in TrajectoryImitationLoss by clamping the denominator. While the intention is correct, the current implementation using clamp(min=1e-6) will likely cause issues. For standard diffusion schedulers with decreasing sigmas, the denominator sigma_ - sigma is negative. Clamping a negative value with a positive minimum flips its sign, which will likely lead to training divergence. I've suggested an alternative that preserves the sign of the denominator.
diffsynth/diffusion/loss.py
Outdated
| latents_ = trajectory_teacher[progress_id_teacher] | ||
|
|
||
| target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma) | ||
| target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of .clamp(min=1e-6) on the denominator (sigma_ - sigma) is likely incorrect and could cause training to fail.
In typical diffusion schedulers, the sigmas array is monotonically decreasing. This means sigma_ (the next sigma) is less than or equal to sigma (the current sigma), making the denominator (sigma_ - sigma) a negative value or zero.
When you apply .clamp(min=1e-6) to a negative number, it will be replaced by 1e-6. This flips the sign of the denominator from negative to positive, which will in turn flip the sign of the target. This incorrect target will likely lead to model training divergence.
Assuming the sigmas are indeed decreasing, you can safely avoid division by zero while preserving the correct sign by clamping to a small negative value instead.
| target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6) | |
| target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(max=-1e-6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch — you're right that .clamp(min=1e-6) flips the sign when (sigma_ - sigma) is negative, which is the normal case since sigmas decrease monotonically. This would invert the target and break training.
I've updated the fix to preserve the sign of the denominator:
denom = sigma_ - sigma
denom = torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6)
target = (latents_ - inputs_shared["latents"]) / denomThis prevents division by zero while keeping the correct sign regardless of sigma ordering.
…t direction The previous .clamp(min=1e-6) on (sigma_ - sigma) flips the sign when the denominator is negative (which is the typical case since sigmas decrease monotonically). This would invert the target and cause training divergence. Use torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6) instead, which prevents division by zero while preserving the correct sign.
Summary
TrajectoryImitationLoss.align_trajectory(), the target computation divides by(sigma_ - sigma). At the final denoising step,sigma_is set to 0 andsigmaapproaches 0, making the denominator near-zero or exactly zero. This producesinf/nangradients that corrupt distillation training..clamp(min=1e-6)to the denominator to prevent the division by zero while preserving the mathematical intent for all other steps.Details
The problematic line in
diffsynth/diffusion/loss.py:At the last step of the scheduler loop,
sigma_is explicitly set to0whenprogress_id + 1 >= len(pipe.scheduler.timesteps), andsigma(the current step's sigma) is already near zero at the end of the schedule. The difference(sigma_ - sigma)therefore approaches zero, causing the division to produceinfornanvalues that propagate through the loss and corrupt gradient updates.The fix clamps the denominator to a small positive value:
Test plan
nan/inflosses