Skip to content

Conversation

@Mr-Neutr0n
Copy link

Summary

  • In TrajectoryImitationLoss.align_trajectory(), the target computation divides by (sigma_ - sigma). At the final denoising step, sigma_ is set to 0 and sigma approaches 0, making the denominator near-zero or exactly zero. This produces inf/nan gradients that corrupt distillation training.
  • Added .clamp(min=1e-6) to the denominator to prevent the division by zero while preserving the mathematical intent for all other steps.

Details

The problematic line in diffsynth/diffusion/loss.py:

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma)

At the last step of the scheduler loop, sigma_ is explicitly set to 0 when progress_id + 1 >= len(pipe.scheduler.timesteps), and sigma (the current step's sigma) is already near zero at the end of the schedule. The difference (sigma_ - sigma) therefore approaches zero, causing the division to produce inf or nan values that propagate through the loss and corrupt gradient updates.

The fix clamps the denominator to a small positive value:

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)

Test plan

  • Verify distillation training runs to completion without nan/inf losses
  • Confirm training metrics remain consistent with expected convergence behavior

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Mr-Neutr0n, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical numerical stability issue within the TrajectoryImitationLoss calculation during distillation training. By introducing a clamp to the denominator in a key calculation, it prevents division by zero errors that previously led to inf or nan gradients, ensuring robust and stable training convergence.

Highlights

  • Numerical Stability Fix: Prevented division by zero in TrajectoryImitationLoss.align_trajectory() by clamping the denominator (sigma_ - sigma) to a minimum value of 1e-6. This resolves inf/nan gradient issues during distillation training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • diffsynth/diffusion/loss.py
    • Added a clamp to the denominator in the target calculation within align_trajectory to prevent division by zero.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a division-by-zero error in TrajectoryImitationLoss by clamping the denominator. While the intention is correct, the current implementation using clamp(min=1e-6) will likely cause issues. For standard diffusion schedulers with decreasing sigmas, the denominator sigma_ - sigma is negative. Clamping a negative value with a positive minimum flips its sign, which will likely lead to training divergence. I've suggested an alternative that preserves the sign of the denominator.

latents_ = trajectory_teacher[progress_id_teacher]

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma)
target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The use of .clamp(min=1e-6) on the denominator (sigma_ - sigma) is likely incorrect and could cause training to fail.

In typical diffusion schedulers, the sigmas array is monotonically decreasing. This means sigma_ (the next sigma) is less than or equal to sigma (the current sigma), making the denominator (sigma_ - sigma) a negative value or zero.

When you apply .clamp(min=1e-6) to a negative number, it will be replaced by 1e-6. This flips the sign of the denominator from negative to positive, which will in turn flip the sign of the target. This incorrect target will likely lead to model training divergence.

Assuming the sigmas are indeed decreasing, you can safely avoid division by zero while preserving the correct sign by clamping to a small negative value instead.

Suggested change
target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)
target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(max=-1e-6)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that .clamp(min=1e-6) flips the sign when (sigma_ - sigma) is negative, which is the normal case since sigmas decrease monotonically. This would invert the target and break training.

I've updated the fix to preserve the sign of the denominator:

denom = sigma_ - sigma
denom = torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6)
target = (latents_ - inputs_shared["latents"]) / denom

This prevents division by zero while keeping the correct sign regardless of sigma ordering.

…t direction

The previous .clamp(min=1e-6) on (sigma_ - sigma) flips the sign when
the denominator is negative (which is the typical case since sigmas
decrease monotonically). This would invert the target and cause
training divergence.

Use torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6) instead,
which prevents division by zero while preserving the correct sign.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant