fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293

Mr-Neutr0n · 2026-02-11T14:21:47Z

Summary

In TrajectoryImitationLoss.align_trajectory(), the target computation divides by (sigma_ - sigma). At the final denoising step, sigma_ is set to 0 and sigma approaches 0, making the denominator near-zero or exactly zero. This produces inf/nan gradients that corrupt distillation training.
Added .clamp(min=1e-6) to the denominator to prevent the division by zero while preserving the mathematical intent for all other steps.

Details

The problematic line in diffsynth/diffusion/loss.py:

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma)

At the last step of the scheduler loop, sigma_ is explicitly set to 0 when progress_id + 1 >= len(pipe.scheduler.timesteps), and sigma (the current step's sigma) is already near zero at the end of the schedule. The difference (sigma_ - sigma) therefore approaches zero, causing the division to produce inf or nan values that propagate through the loss and corrupt gradient updates.

The fix clamps the denominator to a small positive value:

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)

Test plan

Verify distillation training runs to completion without nan/inf losses
Confirm training metrics remain consistent with expected convergence behavior

gemini-code-assist · 2026-02-11T14:22:12Z

Summary of Changes

Hello @Mr-Neutr0n, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical numerical stability issue within the TrajectoryImitationLoss calculation during distillation training. By introducing a clamp to the denominator in a key calculation, it prevents division by zero errors that previously led to inf or nan gradients, ensuring robust and stable training convergence.

Highlights

Numerical Stability Fix: Prevented division by zero in TrajectoryImitationLoss.align_trajectory() by clamping the denominator (sigma_ - sigma) to a minimum value of 1e-6. This resolves inf/nan gradient issues during distillation training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

diffsynth/diffusion/loss.py
- Added a clamp to the denominator in the target calculation within align_trajectory to prevent division by zero.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix a division-by-zero error in TrajectoryImitationLoss by clamping the denominator. While the intention is correct, the current implementation using clamp(min=1e-6) will likely cause issues. For standard diffusion schedulers with decreasing sigmas, the denominator sigma_ - sigma is negative. Clamping a negative value with a positive minimum flips its sign, which will likely lead to training divergence. I've suggested an alternative that preserves the sign of the denominator.

gemini-code-assist · 2026-02-11T14:23:41Z

diffsynth/diffusion/loss.py

                latents_ = trajectory_teacher[progress_id_teacher]

-            target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma)
+            target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)


The use of .clamp(min=1e-6) on the denominator (sigma_ - sigma) is likely incorrect and could cause training to fail.

In typical diffusion schedulers, the sigmas array is monotonically decreasing. This means sigma_ (the next sigma) is less than or equal to sigma (the current sigma), making the denominator (sigma_ - sigma) a negative value or zero.

When you apply .clamp(min=1e-6) to a negative number, it will be replaced by 1e-6. This flips the sign of the denominator from negative to positive, which will in turn flip the sign of the target. This incorrect target will likely lead to model training divergence.

Assuming the sigmas are indeed decreasing, you can safely avoid division by zero while preserving the correct sign by clamping to a small negative value instead.

Suggested change

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)

target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(max=-1e-6)

Good catch — you're right that .clamp(min=1e-6) flips the sign when (sigma_ - sigma) is negative, which is the normal case since sigmas decrease monotonically. This would invert the target and break training.

I've updated the fix to preserve the sign of the denominator:

denom = sigma_ - sigma denom = torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6) target = (latents_ - inputs_shared["latents"]) / denom

This prevents division by zero while keeping the correct sign regardless of sigma ordering.

…t direction The previous .clamp(min=1e-6) on (sigma_ - sigma) flips the sign when the denominator is negative (which is the typical case since sigmas decrease monotonically). This would invert the target and cause training divergence. Use torch.sign(denom) * torch.clamp(denom.abs(), min=1e-6) instead, which prevents division by zero while preserving the correct sign.

fix: prevent division by zero in trajectory imitation loss at last step

0e6976a

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293

fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293

Uh oh!

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Uh oh!

Mr-Neutr0n Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(min=1e-6)
	target = (latents_ - inputs_shared["latents"]) / (sigma_ - sigma).clamp(max=-1e-6)

fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293

Are you sure you want to change the base?

fix: prevent division by zero in TrajectoryImitationLoss at final denoising step #1293

Uh oh!

Conversation

Mr-Neutr0n commented Feb 11, 2026

Summary

Details

Test plan

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Mr-Neutr0n Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant