A complete implementation of Denoising Diffusion Probabilistic Models (DDPM) for handwritten digit generation
Demo • Features • Installation • Usage • Results
This project implements a Denoising Diffusion Probabilistic Model (DDPM) from scratch to generate realistic handwritten digits from the MNIST dataset. The implementation demonstrates the complete diffusion pipeline, including forward noising, reverse denoising, U-Net architecture, and classifier-free guidance.
This project was completed as part of NVIDIA's Deep Learning Institute certification program.
- ✅ 95%+ classifier accuracy on generated samples
- ✅ Successfully trained DDPM model with final loss: 0.033
- ✅ Implemented classifier-free guidance for improved sample quality
Course Title: Generative AI with Diffusion Models
Provider: NVIDIA Deep Learning Institute
Official Courses: NVIDIA DLI Training
This repository contains my personal implementation based on concepts learned from the NVIDIA DLI course. All original course materials, including instructional notebooks, assessment scripts, and proprietary utilities, remain the intellectual property of NVIDIA Corporation.
For access to official course materials, please register at: https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-FX-14+V1
- Forward Diffusion Process - Progressive noise addition with beta scheduling
- Reverse Diffusion Process - Learned denoising through neural network
- U-Net Architecture - Custom implementation with:
- Residual blocks
- Down/up sampling layers
- Sinusoidal position embeddings
- Conditional class embeddings
- Classifier-Free Guidance - Improved sample quality through guidance weighting
- Complete Training Pipeline - End-to-end training and inference
- Timesteps (T): 150
- Image Size: 28×28 pixels (grayscale)
- Model Parameters: ~290K
- Training Dataset: MNIST (70,000 images)
- Guidance Weight: 5.0
- Optimizer: Adam (lr=0.001)
- Loss Function: MSE Loss
Example of digits (0-9) generated by the trained model:
Visualization of the denoising process at different timesteps:
Python 3.8+
CUDA 11.0+ (for GPU acceleration)- Clone the repository
git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git
cd diffusion-mnist- Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txttorch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
matplotlib>=3.7.0
Pillow>=9.5.0
jupyter>=1.0.0from src.diffusion_model import DiffusionModel
from src.unet import UNet
from src.train import train_model
# Initialize model
model = UNet(
timesteps=150,
img_channels=1,
img_size=28,
down_channels=(64, 64, 128)
)
# Train
trained_model = train_model(
model=model,
epochs=5,
batch_size=128,
learning_rate=0.001
)from src.inference import generate_samples
# Generate digits 0-9
samples = generate_samples(
model=trained_model,
num_classes=10,
guidance_weight=5.0
)
# Visualize
from src.utils import display_grid
display_grid(samples)import torch
from src.unet import UNet
# Load model
model = UNet(timesteps=150, img_channels=1, img_size=28)
model.load_state_dict(torch.load('checkpoints/diffusion_model.pth'))
model.eval()
# Generate
samples = generate_samples(model, num_classes=10, guidance_weight=5.0)| Metric | Value |
|---|---|
| Final Training Loss | 0.033 |
| Training Epochs | 5 |
| Total Training Time | ~15 minutes (GPU) |
| Classifier Accuracy on Generated Samples | 95%+ |
- ✅ Successfully generates all 10 digit classes (0-9)
- ✅ High visual quality and recognizability
- ✅ Consistent generation across different seeds
- ✅ Effective classifier-free guidance implementation
-
Diffusion Process Mathematics
- Forward process: q(x_t | x_{t-1})
- Reverse process: p_θ(x_{t-1} | x_t)
- Beta scheduling strategies
- Reparameterization trick
-
U-Net Architecture
- Encoder-decoder structure
- Skip connections for preserving spatial information
- Time embedding through sinusoidal position encoding
- Conditional generation with class embeddings
-
Training Techniques
- Noise prediction objective
- Mean Squared Error (MSE) loss
- Classifier-free guidance implementation
- Context dropout for unconditional training
-
PyTorch Best Practices
- Efficient data loading with DataLoader
- GPU acceleration with CUDA
- Model compilation for optimization
- Gradient management and backpropagation
-
Start with the Math
- Understanding the beta schedule is crucial
- Visualize the forward diffusion process first
- The reparameterization trick makes training possible
-
U-Net Implementation
- Pay close attention to tensor dimensions
- Skip connections are essential for reconstruction
- Time embeddings should be injected at multiple layers
-
Training Strategy
- Monitor loss curve - should decrease steadily
- Visualize samples during training to verify progress
- Classifier-free guidance weight (w) significantly impacts quality
- Start with w=5.0, adjust based on results
-
Common Pitfalls to Avoid
- ❌ Forgetting to normalize images to [0,1]
- ❌ Incorrect tensor broadcasting in diffusion formulas
- ❌ Not using
.to(device)for all tensors - ❌ Mixing up timestep indexing (0-based vs 1-based)
-
Optimization Tips
- Use
torch.compile()for 20-30% speedup - Batch size of 128 works well for MNIST
- Adam optimizer with lr=0.001 is a good starting point
- Save checkpoints regularly during training
- Use
w = 0.0 → Unconditional generation (blurry)
w = 3.0 → Decent quality
w = 5.0 → High quality (recommended)
w = 7.0 → Very sharp but may overfit
w = 10.0 → Too constrained
The forward process gradually adds Gaussian noise to images:
def forward_diffusion(x_0, t, noise):
"""
q(x_t | x_0) = N(x_t; √(ᾱ_t)x_0, (1-ᾱ_t)I)
"""
sqrt_alpha_bar_t = sqrt_alpha_bar[t]
sqrt_one_minus_alpha_bar_t = sqrt(1 - alpha_bar[t])
x_t = sqrt_alpha_bar_t * x_0 + sqrt_one_minus_alpha_bar_t * noise
return x_tThe reverse process learns to denoise:
def reverse_diffusion(x_t, t, predicted_noise):
"""
p_θ(x_{t-1} | x_t) - learned denoising step
"""
alpha_t = alpha[t]
alpha_bar_t = alpha_bar[t]
# Predict x_0 from x_t and predicted noise
predicted_x0 = (x_t - sqrt(1 - alpha_bar_t) * predicted_noise) / sqrt(alpha_bar_t)
# Sample x_{t-1}
if t > 0:
noise = torch.randn_like(x_t)
x_t_minus_1 = predicted_x0 + sqrt(beta[t]) * noise
else:
x_t_minus_1 = predicted_x0
return x_t_minus_1Improves sample quality by amplifying conditional signal:
def classifier_free_guidance(noise_pred_cond, noise_pred_uncond, w):
"""
ε̃ = (1 + w)ε_θ(x_t, c) - w·ε_θ(x_t, ∅)
"""
return (1 + w) * noise_pred_cond - w * noise_pred_uncond- Denoising Diffusion Probabilistic Models (DDPM) - Ho et al., 2020
- Improved Denoising Diffusion Probabilistic Models - Nichol & Dhariwal, 2021
- Classifier-Free Diffusion Guidance - Ho & Salimans, 2022
Contributions are welcome! Here's how you can help:
- Report Bugs - Open an issue describing the problem
- Suggest Enhancements - Share ideas for improvements
- Submit Pull Requests - Fix bugs or add features
# Fork and clone the repository
git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/
# Format code
black src/Potential improvements and extensions:
-
Different Datasets
- CIFAR-10 (32×32 color images)
- Fashion-MNIST
- Custom datasets
-
Model Improvements
- Attention mechanisms in U-Net
- Latent diffusion for efficiency
- Different noise schedules (cosine, linear, etc.)
-
Features
- Gradio/Streamlit web interface
- Real-time generation demo
- Model interpretability visualizations
- FID/IS score evaluation
-
Optimization
- Mixed precision training (FP16)
- Distributed training support
- ONNX export for deployment
This project is licensed under the MIT License - see the LICENSE file for details.
- My Implementation: MIT License (you're free to use, modify, distribute)
- NVIDIA Course Materials: Remain property of NVIDIA Corporation
- PyTorch & Dependencies: Respective open-source licenses
- NVIDIA Deep Learning Institute - For providing exceptional educational content and hands-on learning experience in generative AI
- NVIDIA Corporation - For making advanced AI education accessible through their DLI program
- PyTorch Team - For the excellent deep learning framework
- Research Community - For foundational papers on diffusion models (Ho et al., Nichol & Dhariwal, etc.)
This implementation was built following concepts from:
- NVIDIA DLI Course: "Generative AI with Diffusion Models"
- Original DDPM paper by Ho et al. (2020)
- U-Net architecture by Ronneberger et al. (2015)
- Classifier-free guidance by Ho & Salimans (2022)
Author: Mostafa Abdelhamed
Email: [abdelhamedmostafa190@gmail.com]
LinkedIn: [www.linkedin.com/in/mostafa-abdelhamed-88a447286]
If you found this project helpful or interesting:
- ⭐ Star this repository
- 🐛 Report issues
- 💬 Start a discussion
- 🔗 Share with others
By completing this project, I gained practical experience in:
✅ Mathematics of Diffusion Models
- Forward and reverse processes
- Markov chain formulation
- Variational inference
✅ Deep Learning Architecture Design
- U-Net encoder-decoder structure
- Residual connections
- Multi-scale processing
✅ Modern PyTorch Development
- Model compilation and optimization
- Efficient data pipelines
- GPU acceleration
✅ Generative AI Techniques
- Classifier-free guidance
- Conditional generation
- Sample quality evaluation
✅ Best Practices
- Version control with Git
- Documentation and README writing
- Code organization and modularity
- Testing and validation
I would be happy to hear your suggestions for the future improvement Completed: February 2026


