Skip to content

My implementation of NVIDIA's "Generative AI with Diffusion Models" course. Built a DDPM (Denoising Diffusion Probabilistic Model) from scratch using PyTorch to generate handwritten digits from the MNIST dataset. Includes U-Net architecture, classifier-free guidance, and achieves 95%+ classifier accuracy.

License

Notifications You must be signed in to change notification settings

MostafaAI10/NVIDIA-Diffusion-Models-Course

Repository files navigation

MNIST NVIDIA-Diffusion Models

Python PyTorch License NVIDIA DLI

A complete implementation of Denoising Diffusion Probabilistic Models (DDPM) for handwritten digit generation

DemoFeaturesInstallationUsageResults


Overview

This project implements a Denoising Diffusion Probabilistic Model (DDPM) from scratch to generate realistic handwritten digits from the MNIST dataset. The implementation demonstrates the complete diffusion pipeline, including forward noising, reverse denoising, U-Net architecture, and classifier-free guidance.

This project was completed as part of NVIDIA's Deep Learning Institute certification program.

Key Achievement

  • 95%+ classifier accuracy on generated samples
  • ✅ Successfully trained DDPM model with final loss: 0.033
  • ✅ Implemented classifier-free guidance for improved sample quality

Course Information

Course Title: Generative AI with Diffusion Models
Provider: NVIDIA Deep Learning Institute
Official Courses: NVIDIA DLI Training

Important Note on Content

This repository contains my personal implementation based on concepts learned from the NVIDIA DLI course. All original course materials, including instructional notebooks, assessment scripts, and proprietary utilities, remain the intellectual property of NVIDIA Corporation.

For access to official course materials, please register at: https://learn.nvidia.com/courses/course-detail?course_id=course-v1:DLI+S-FX-14+V1


Features

Core Implementation

  • Forward Diffusion Process - Progressive noise addition with beta scheduling
  • Reverse Diffusion Process - Learned denoising through neural network
  • U-Net Architecture - Custom implementation with:
    • Residual blocks
    • Down/up sampling layers
    • Sinusoidal position embeddings
    • Conditional class embeddings
  • Classifier-Free Guidance - Improved sample quality through guidance weighting
  • Complete Training Pipeline - End-to-end training and inference

Technical Details

  • Timesteps (T): 150
  • Image Size: 28×28 pixels (grayscale)
  • Model Parameters: ~290K
  • Training Dataset: MNIST (70,000 images)
  • Guidance Weight: 5.0
  • Optimizer: Adam (lr=0.001)
  • Loss Function: MSE Loss

Demo

Generated Samples

Example of digits (0-9) generated by the trained model:

image alt

Training Progression

Visualization of the denoising process at different timesteps:

image alt


Installation

Prerequisites

Python 3.8+
CUDA 11.0+ (for GPU acceleration)

Setup

  1. Clone the repository
git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git
cd diffusion-mnist
  1. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt

Requirements.txt

torch>=2.0.0
torchvision>=0.15.0
numpy>=1.24.0
matplotlib>=3.7.0
Pillow>=9.5.0
jupyter>=1.0.0

Usage

Training the Model

from src.diffusion_model import DiffusionModel
from src.unet import UNet
from src.train import train_model

# Initialize model
model = UNet(
    timesteps=150,
    img_channels=1,
    img_size=28,
    down_channels=(64, 64, 128)
)

# Train
trained_model = train_model(
    model=model,
    epochs=5,
    batch_size=128,
    learning_rate=0.001
)

Generating Images

from src.inference import generate_samples

# Generate digits 0-9
samples = generate_samples(
    model=trained_model,
    num_classes=10,
    guidance_weight=5.0
)

# Visualize
from src.utils import display_grid
display_grid(samples)

Using Pre-trained Model

import torch
from src.unet import UNet

# Load model
model = UNet(timesteps=150, img_channels=1, img_size=28)
model.load_state_dict(torch.load('checkpoints/diffusion_model.pth'))
model.eval()

# Generate
samples = generate_samples(model, num_classes=10, guidance_weight=5.0)

Results

Training Metrics

Metric Value
Final Training Loss 0.033
Training Epochs 5
Total Training Time ~15 minutes (GPU)
Classifier Accuracy on Generated Samples 95%+

Model Performance

  • ✅ Successfully generates all 10 digit classes (0-9)
  • ✅ High visual quality and recognizability
  • ✅ Consistent generation across different seeds
  • ✅ Effective classifier-free guidance implementation

image alt


What I Learned

Core Concepts Mastered

  1. Diffusion Process Mathematics

    • Forward process: q(x_t | x_{t-1})
    • Reverse process: p_θ(x_{t-1} | x_t)
    • Beta scheduling strategies
    • Reparameterization trick
  2. U-Net Architecture

    • Encoder-decoder structure
    • Skip connections for preserving spatial information
    • Time embedding through sinusoidal position encoding
    • Conditional generation with class embeddings
  3. Training Techniques

    • Noise prediction objective
    • Mean Squared Error (MSE) loss
    • Classifier-free guidance implementation
    • Context dropout for unconditional training
  4. PyTorch Best Practices

    • Efficient data loading with DataLoader
    • GPU acceleration with CUDA
    • Model compilation for optimization
    • Gradient management and backpropagation

Key Tips & Insights

For Future Learners

  1. Start with the Math

    • Understanding the beta schedule is crucial
    • Visualize the forward diffusion process first
    • The reparameterization trick makes training possible
  2. U-Net Implementation

    • Pay close attention to tensor dimensions
    • Skip connections are essential for reconstruction
    • Time embeddings should be injected at multiple layers
  3. Training Strategy

    • Monitor loss curve - should decrease steadily
    • Visualize samples during training to verify progress
    • Classifier-free guidance weight (w) significantly impacts quality
    • Start with w=5.0, adjust based on results
  4. Common Pitfalls to Avoid

    • ❌ Forgetting to normalize images to [0,1]
    • ❌ Incorrect tensor broadcasting in diffusion formulas
    • ❌ Not using .to(device) for all tensors
    • ❌ Mixing up timestep indexing (0-based vs 1-based)
  5. Optimization Tips

    • Use torch.compile() for 20-30% speedup
    • Batch size of 128 works well for MNIST
    • Adam optimizer with lr=0.001 is a good starting point
    • Save checkpoints regularly during training

Guidance Weight Impact

w = 0.0  → Unconditional generation (blurry)
w = 3.0  → Decent quality
w = 5.0  → High quality (recommended)
w = 7.0  → Very sharp but may overfit
w = 10.0 → Too constrained

Technical Deep Dive

Forward Diffusion Process

The forward process gradually adds Gaussian noise to images:

def forward_diffusion(x_0, t, noise):
    """
    q(x_t | x_0) = N(x_t; √(ᾱ_t)x_0, (1-ᾱ_t)I)
    """
    sqrt_alpha_bar_t = sqrt_alpha_bar[t]
    sqrt_one_minus_alpha_bar_t = sqrt(1 - alpha_bar[t])
    
    x_t = sqrt_alpha_bar_t * x_0 + sqrt_one_minus_alpha_bar_t * noise
    return x_t

Reverse Diffusion Process

The reverse process learns to denoise:

def reverse_diffusion(x_t, t, predicted_noise):
    """
    p_θ(x_{t-1} | x_t) - learned denoising step
    """
    alpha_t = alpha[t]
    alpha_bar_t = alpha_bar[t]
    
    # Predict x_0 from x_t and predicted noise
    predicted_x0 = (x_t - sqrt(1 - alpha_bar_t) * predicted_noise) / sqrt(alpha_bar_t)
    
    # Sample x_{t-1}
    if t > 0:
        noise = torch.randn_like(x_t)
        x_t_minus_1 = predicted_x0 + sqrt(beta[t]) * noise
    else:
        x_t_minus_1 = predicted_x0
    
    return x_t_minus_1

Classifier-Free Guidance

Improves sample quality by amplifying conditional signal:

def classifier_free_guidance(noise_pred_cond, noise_pred_uncond, w):
    """
    ε̃ = (1 + w)ε_θ(x_t, c) - w·ε_θ(x_t, ∅)
    """
    return (1 + w) * noise_pred_cond - w * noise_pred_uncond

Additional Resources

Foundational Papers

Tutorials & Guides

Related Implementations

NVIDIA Resources


Contributing

Contributions are welcome! Here's how you can help:

  1. Report Bugs - Open an issue describing the problem
  2. Suggest Enhancements - Share ideas for improvements
  3. Submit Pull Requests - Fix bugs or add features

Development Setup

# Fork and clone the repository
git clone https://github.com/MostafaAI10/NVIDIA-Diffusion-Models-Course.git

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/

# Format code
black src/

Future Enhancements

Potential improvements and extensions:

  • Different Datasets

    • CIFAR-10 (32×32 color images)
    • Fashion-MNIST
    • Custom datasets
  • Model Improvements

    • Attention mechanisms in U-Net
    • Latent diffusion for efficiency
    • Different noise schedules (cosine, linear, etc.)
  • Features

    • Gradio/Streamlit web interface
    • Real-time generation demo
    • Model interpretability visualizations
    • FID/IS score evaluation
  • Optimization

    • Mixed precision training (FP16)
    • Distributed training support
    • ONNX export for deployment

License

This project is licensed under the MIT License - see the LICENSE file for details.

Important Licensing Notes

  • My Implementation: MIT License (you're free to use, modify, distribute)
  • NVIDIA Course Materials: Remain property of NVIDIA Corporation
  • PyTorch & Dependencies: Respective open-source licenses

Acknowledgments

Special Thanks

  • NVIDIA Deep Learning Institute - For providing exceptional educational content and hands-on learning experience in generative AI
  • NVIDIA Corporation - For making advanced AI education accessible through their DLI program
  • PyTorch Team - For the excellent deep learning framework
  • Research Community - For foundational papers on diffusion models (Ho et al., Nichol & Dhariwal, etc.)

Inspiration & References

This implementation was built following concepts from:

  • NVIDIA DLI Course: "Generative AI with Diffusion Models"
  • Original DDPM paper by Ho et al. (2020)
  • U-Net architecture by Ronneberger et al. (2015)
  • Classifier-free guidance by Ho & Salimans (2022)

Contact & Connect

Author: Mostafa Abdelhamed Email: [abdelhamedmostafa190@gmail.com]
LinkedIn: [www.linkedin.com/in/mostafa-abdelhamed-88a447286]

Let's Connect!

If you found this project helpful or interesting:

  • Star this repository
  • 🐛 Report issues
  • 💬 Start a discussion
  • 🔗 Share with others

Project Statistics

GitHub stars GitHub forks GitHub watchers


Learning Outcomes

By completing this project, I gained practical experience in:

Mathematics of Diffusion Models

  • Forward and reverse processes
  • Markov chain formulation
  • Variational inference

Deep Learning Architecture Design

  • U-Net encoder-decoder structure
  • Residual connections
  • Multi-scale processing

Modern PyTorch Development

  • Model compilation and optimization
  • Efficient data pipelines
  • GPU acceleration

Generative AI Techniques

  • Classifier-free guidance
  • Conditional generation
  • Sample quality evaluation

Best Practices

  • Version control with Git
  • Documentation and README writing
  • Code organization and modularity
  • Testing and validation

I would be happy to hear your suggestions for the future improvement Completed: February 2026

About

My implementation of NVIDIA's "Generative AI with Diffusion Models" course. Built a DDPM (Denoising Diffusion Probabilistic Model) from scratch using PyTorch to generate handwritten digits from the MNIST dataset. Includes U-Net architecture, classifier-free guidance, and achieves 95%+ classifier accuracy.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published