MNIST NVIDIA-Diffusion Models

A complete implementation of Denoising Diffusion Probabilistic Models (DDPM) for handwritten digit generation

Demo • Features • Installation • Usage • Results

Overview

This project implements a Denoising Diffusion Probabilistic Model (DDPM) from scratch to generate realistic handwritten digits from the MNIST dataset. The implementation demonstrates the complete diffusion pipeline, including forward noising, reverse denoising, U-Net architecture, and classifier-free guidance.

This project was completed as part of NVIDIA's Deep Learning Institute certification program.

Key Achievement

✅ 95%+ classifier accuracy on generated samples
✅ Successfully trained DDPM model with final loss: 0.033
✅ Implemented classifier-free guidance for improved sample quality

Course Information

Course Title: Generative AI with Diffusion Models
Provider: NVIDIA Deep Learning Institute
Official Courses: NVIDIA DLI Training

Important Note on Content

This repository contains my personal implementation based on concepts learned from the NVIDIA DLI course. All original course materials, including instructional notebooks, assessment scripts, and proprietary utilities, remain the intellectual property of NVIDIA Corporation.

For access to official course materials, please register at: https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-FX-14+V1

Features

Core Implementation

Forward Diffusion Process - Progressive noise addition with beta scheduling
Reverse Diffusion Process - Learned denoising through neural network
U-Net Architecture - Custom implementation with:
- Residual blocks
- Down/up sampling layers
- Sinusoidal position embeddings
- Conditional class embeddings
Classifier-Free Guidance - Improved sample quality through guidance weighting
Complete Training Pipeline - End-to-end training and inference

Technical Details

Timesteps (T): 150
Image Size: 28×28 pixels (grayscale)
Model Parameters: ~290K
Training Dataset: MNIST (70,000 images)
Guidance Weight: 5.0
Optimizer: Adam (lr=0.001)
Loss Function: MSE Loss

Demo

Generated Samples

Example of digits (0-9) generated by the trained model:

Training Progression

Visualization of the denoising process at different timesteps:

Installation

Prerequisites

Python 3.8+
CUDA 11.0+ (for GPU acceleration)

Setup

Clone the repository

git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git
cd diffusion-mnist

Create virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Requirements.txt

torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
matplotlib>=3.7.0
Pillow>=9.5.0
jupyter>=1.0.0

Usage

Training the Model

from src.diffusion_model import DiffusionModel
from src.unet import UNet
from src.train import train_model

# Initialize model
model = UNet(
    timesteps=150,
    img_channels=1,
    img_size=28,
    down_channels=(64, 64, 128)
)

# Train
trained_model = train_model(
    model=model,
    epochs=5,
    batch_size=128,
    learning_rate=0.001
)

Generating Images

from src.inference import generate_samples

# Generate digits 0-9
samples = generate_samples(
    model=trained_model,
    num_classes=10,
    guidance_weight=5.0
)

# Visualize
from src.utils import display_grid
display_grid(samples)

Using Pre-trained Model

import torch
from src.unet import UNet

# Load model
model = UNet(timesteps=150, img_channels=1, img_size=28)
model.load_state_dict(torch.load('checkpoints/diffusion_model.pth'))
model.eval()

# Generate
samples = generate_samples(model, num_classes=10, guidance_weight=5.0)

Results

Training Metrics

Metric	Value
Final Training Loss	0.033
Training Epochs	5
Total Training Time	~15 minutes (GPU)
Classifier Accuracy on Generated Samples	95%+

Model Performance

✅ Successfully generates all 10 digit classes (0-9)
✅ High visual quality and recognizability
✅ Consistent generation across different seeds
✅ Effective classifier-free guidance implementation

What I Learned

Core Concepts Mastered

Diffusion Process Mathematics
- Forward process: q(x_t | x_{t-1})
- Reverse process: p_θ(x_{t-1} | x_t)
- Beta scheduling strategies
- Reparameterization trick
U-Net Architecture
- Encoder-decoder structure
- Skip connections for preserving spatial information
- Time embedding through sinusoidal position encoding
- Conditional generation with class embeddings
Training Techniques
- Noise prediction objective
- Mean Squared Error (MSE) loss
- Classifier-free guidance implementation
- Context dropout for unconditional training
PyTorch Best Practices
- Efficient data loading with DataLoader
- GPU acceleration with CUDA
- Model compilation for optimization
- Gradient management and backpropagation

Key Tips & Insights

For Future Learners

Start with the Math
- Understanding the beta schedule is crucial
- Visualize the forward diffusion process first
- The reparameterization trick makes training possible
U-Net Implementation
- Pay close attention to tensor dimensions
- Skip connections are essential for reconstruction
- Time embeddings should be injected at multiple layers
Training Strategy
- Monitor loss curve - should decrease steadily
- Visualize samples during training to verify progress
- Classifier-free guidance weight (w) significantly impacts quality
- Start with w=5.0, adjust based on results
Common Pitfalls to Avoid
- ❌ Forgetting to normalize images to [0,1]
- ❌ Incorrect tensor broadcasting in diffusion formulas
- ❌ Not using .to(device) for all tensors
- ❌ Mixing up timestep indexing (0-based vs 1-based)
Optimization Tips
- Use torch.compile() for 20-30% speedup
- Batch size of 128 works well for MNIST
- Adam optimizer with lr=0.001 is a good starting point
- Save checkpoints regularly during training

Guidance Weight Impact

w = 0.0  → Unconditional generation (blurry)
w = 3.0  → Decent quality
w = 5.0  → High quality (recommended)
w = 7.0  → Very sharp but may overfit
w = 10.0 → Too constrained

Technical Deep Dive

Forward Diffusion Process

The forward process gradually adds Gaussian noise to images:

def forward_diffusion(x_0, t, noise):
    """
    q(x_t | x_0) = N(x_t; √(ᾱ_t)x_0, (1-ᾱ_t)I)
    """
    sqrt_alpha_bar_t = sqrt_alpha_bar[t]
    sqrt_one_minus_alpha_bar_t = sqrt(1 - alpha_bar[t])
    
    x_t = sqrt_alpha_bar_t * x_0 + sqrt_one_minus_alpha_bar_t * noise
    return x_t

Reverse Diffusion Process

The reverse process learns to denoise:

def reverse_diffusion(x_t, t, predicted_noise):
    """
    p_θ(x_{t-1} | x_t) - learned denoising step
    """
    alpha_t = alpha[t]
    alpha_bar_t = alpha_bar[t]
    
    # Predict x_0 from x_t and predicted noise
    predicted_x0 = (x_t - sqrt(1 - alpha_bar_t) * predicted_noise) / sqrt(alpha_bar_t)
    
    # Sample x_{t-1}
    if t > 0:
        noise = torch.randn_like(x_t)
        x_t_minus_1 = predicted_x0 + sqrt(beta[t]) * noise
    else:
        x_t_minus_1 = predicted_x0
    
    return x_t_minus_1

Classifier-Free Guidance

Improves sample quality by amplifying conditional signal:

def classifier_free_guidance(noise_pred_cond, noise_pred_uncond, w):
    """
    ε̃ = (1 + w)ε_θ(x_t, c) - w·ε_θ(x_t, ∅)
    """
    return (1 + w) * noise_pred_cond - w * noise_pred_uncond

Additional Resources

Foundational Papers

Denoising Diffusion Probabilistic Models (DDPM) - Ho et al., 2020
Improved Denoising Diffusion Probabilistic Models - Nichol & Dhariwal, 2021
Classifier-Free Diffusion Guidance - Ho & Salimans, 2022

Tutorials & Guides

Related Implementations

NVIDIA Resources

Contributing

Contributions are welcome! Here's how you can help:

Report Bugs - Open an issue describing the problem
Suggest Enhancements - Share ideas for improvements
Submit Pull Requests - Fix bugs or add features

Development Setup

# Fork and clone the repository
git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Format code
black src/

Future Enhancements

Potential improvements and extensions:

Different Datasets
- CIFAR-10 (32×32 color images)
- Fashion-MNIST
- Custom datasets
Model Improvements
- Attention mechanisms in U-Net
- Latent diffusion for efficiency
- Different noise schedules (cosine, linear, etc.)
Features
- Gradio/Streamlit web interface
- Real-time generation demo
- Model interpretability visualizations
- FID/IS score evaluation
Optimization
- Mixed precision training (FP16)
- Distributed training support
- ONNX export for deployment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Important Licensing Notes

My Implementation: MIT License (you're free to use, modify, distribute)
NVIDIA Course Materials: Remain property of NVIDIA Corporation
PyTorch & Dependencies: Respective open-source licenses

Acknowledgments

Special Thanks

NVIDIA Deep Learning Institute - For providing exceptional educational content and hands-on learning experience in generative AI
NVIDIA Corporation - For making advanced AI education accessible through their DLI program
PyTorch Team - For the excellent deep learning framework
Research Community - For foundational papers on diffusion models (Ho et al., Nichol & Dhariwal, etc.)

Inspiration & References

This implementation was built following concepts from:

NVIDIA DLI Course: "Generative AI with Diffusion Models"
Original DDPM paper by Ho et al. (2020)
U-Net architecture by Ronneberger et al. (2015)
Classifier-free guidance by Ho & Salimans (2022)

Contact & Connect

Author: Mostafa Abdelhamed Email: [abdelhamedmostafa190@gmail.com]
LinkedIn: [www.linkedin.com/in/mostafa-abdelhamed-88a447286]

Let's Connect!

If you found this project helpful or interesting:

⭐ Star this repository
🐛 Report issues
💬 Start a discussion
🔗 Share with others

Project Statistics

Learning Outcomes

By completing this project, I gained practical experience in:

✅ Mathematics of Diffusion Models

Forward and reverse processes
Markov chain formulation
Variational inference

✅ Deep Learning Architecture Design

U-Net encoder-decoder structure
Residual connections
Multi-scale processing

✅ Modern PyTorch Development

Model compilation and optimization
Efficient data pipelines
GPU acceleration

✅ Generative AI Techniques

Classifier-free guidance
Conditional generation
Sample quality evaluation

✅ Best Practices

Version control with Git
Documentation and README writing
Code organization and modularity
Testing and validation

I would be happy to hear your suggestions for the future improvement Completed: February 2026

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
NVIDIA_Diffusion Model		NVIDIA_Diffusion Model
Evaluating model.png		Evaluating model.png
LICENSE		LICENSE
README.md		README.md
diffusion_process.png		diffusion_process.png
generated_samples.png		generated_samples.png

License

MostafaAI10/NVIDIA-Diffusion-Models-Course

Folders and files

Latest commit

History

Repository files navigation

MNIST NVIDIA-Diffusion Models

Overview

Key Achievement

Course Information

Important Note on Content

Features

Core Implementation

Technical Details

Demo

Generated Samples

Training Progression

Installation

Prerequisites

Setup

Requirements.txt

Usage

Training the Model

Generating Images

Using Pre-trained Model

Results

Training Metrics

Model Performance

What I Learned

Core Concepts Mastered

Key Tips & Insights

For Future Learners

Guidance Weight Impact

Technical Deep Dive

Forward Diffusion Process

Reverse Diffusion Process

Classifier-Free Guidance

Additional Resources

Foundational Papers

Tutorials & Guides

Related Implementations

NVIDIA Resources

Contributing

Development Setup

Future Enhancements

License

Important Licensing Notes

Acknowledgments

Special Thanks

Inspiration & References

Contact & Connect

Let's Connect!

Project Statistics

Learning Outcomes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages