Add modular pipeline for HunyuanVideo 1.5 by akshan-main · Pull Request #13389 · huggingface/diffusers

akshan-main · 2026-04-02T16:09:27Z

What does this PR do?

Adds modular pipeline blocks for HunyuanVideo 1.5 with both text-to-video (HunyuanVideo15Blocks) and image-to-video (HunyuanVideo15Image2VideoBlocks).

Parity verified on Colab G4 GPU:

T2V: MAD 0.000000 vs HunyuanVideo15Pipeline

hv15_t2v_standard.mp4

hv15_t2v_modular.mp4

T2V reproduction code

import gc
import numpy as np
import torch
from diffusers import (
    HunyuanVideo15Pipeline,
    HunyuanVideo15ImageToVideoPipeline,
    HunyuanVideo15Blocks,
    HunyuanVideo15ModularPipeline,
)
from diffusers.utils import load_image, export_to_video

device = "cuda"
dtype = torch.bfloat16

T2V_ID = "hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v"
I2V_ID = "hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v"

def to_np(x):
    if hasattr(x, "frames"):
        x = x.frames
    if isinstance(x, list):
        x = np.array(x)
    if isinstance(x, torch.Tensor):
        x = x.float().cpu().numpy()
    return x
prompt = "A cinematic drone shot over snowy mountains at sunrise."

print("=== Standard T2V ===")

ref_pipe = HunyuanVideo15Pipeline.from_pretrained(T2V_ID, torch_dtype=dtype).to(device)
g = torch.Generator(device=device).manual_seed(1234)
ref_out = ref_pipe(prompt=prompt, num_frames=55, num_inference_steps=6, generator=g, output_type="np").frames
print(f"Shape: {np.array(ref_out).shape}")
export_to_video(ref_out[0], "/content/hv15_t2v_standard.mp4", fps=24)
del ref_pipe; gc.collect(); torch.cuda.empty_cache()



print("\n=== Modular T2V ===")
blocks = HunyuanVideo15Blocks()
pipe = blocks.init_pipeline(T2V_ID)
pipe.load_components(torch_dtype=dtype)
pipe.to(device)

print("Guider type:", type(pipe.guider).__name__)
print("Guider scale:", pipe.guider.guidance_scale)
print("Guider enabled:", pipe.guider._enabled)
print("Guider num_conditions:", pipe.guider.num_conditions)
g = torch.Generator(device=device).manual_seed(1234)
mod_out = pipe(prompt=prompt, num_frames=55, num_inference_steps=6, generator=g, output="videos", output_type="np")
print(f"Shape: {np.array(mod_out).shape}")
export_to_video(mod_out[0], "/content/hv15_t2v_modular.mp4", fps=24)

diff = np.abs(to_np(ref_out).astype(float) - to_np(mod_out).astype(float)).mean()
print(f"\nT2V MAD: {diff:.6f}")
del pipe, blocks; gc.collect(); torch.cuda.empty_cache()

I2V: MAD 0.000000 vs HunyuanVideo15ImageToVideoPipeline

hv15_i2v_standard.mp4

hv15_i2v_modular.mp4

I2V reproduction code

from diffusers.modular_pipelines import HunyuanVideo15Blocks, HunyuanVideo15Image2VideoBlocks, HunyuanVideo15ModularPipeline

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").convert("RGB")

print("=== Standard I2V ===")
ref_pipe = HunyuanVideo15ImageToVideoPipeline.from_pretrained(I2V_ID, torch_dtype=dtype).to(device)
g = torch.Generator(device=device).manual_seed(1234)
ref_out = ref_pipe(image=image, prompt="A cat turns its head", num_frames=55, num_inference_steps=6, generator=g, output_type="np").frames
print(f"Shape: {np.array(ref_out).shape}")
export_to_video(ref_out[0], "/content/hv15_i2v_standard.mp4", fps=24)
del ref_pipe; gc.collect(); torch.cuda.empty_cache()

print("\n=== Modular I2V ===")
blocks = HunyuanVideo15Image2VideoBlocks()
pipe = blocks.init_pipeline(I2V_ID)
pipe.load_components(torch_dtype=dtype)
pipe.to(device)
g = torch.Generator(device=device).manual_seed(1234)
mod_out = pipe(image=image, prompt="A cat turns its head", num_frames=55, num_inference_steps=6, generator=g, output="videos", output_type="np")
print(f"Shape: {np.array(mod_out).shape}")
export_to_video(mod_out[0], "/content/hv15_i2v_modular.mp4", fps=24)

diff = np.abs(to_np(ref_out).astype(float) - to_np(mod_out).astype(float)).mean()
print(f"\nI2V MAD: {diff:.6f}")
print("\n=== Done ===")

Addresses #13295 (HunyuanVideo 1.5 contribution)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case. — Modular Diffusers 🧨 #13295
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu @asomoza

akshan-main added 12 commits April 2, 2026 08:48

Add modular pipeline support for HunyuanVideo 1.5

f07f1e8

Fix I2V latent/cond spatial dimension mismatch

6090638

Fix guidance_scale default to 7.5 matching ClassifierFreeGuidance

85802a7

Fix tokenizer type: use Qwen2TokenizerFast to match model

a3d814b

Fix system message string formatting to match standard pipeline

22e7939

Rewrite HunyuanVideo 1.5 modular: use standard pipeline methods directly

00564fe

Remove I2V exports (T2V only for now)

7a46b21

Fix encoder: use static methods directly instead of encode_prompt

3953a25

Inline all standard pipeline methods, remove runtime dependency

e8176d2

Add HunyuanVideo 1.5 image-to-video modular blocks

e8f99f9

Fix missing FrozenDict import in before_denoise.py

562fa49

auto-generated docstrings via #auto_docstring

bd45ef6

akshan-main mentioned this pull request Apr 2, 2026

[modular] Add LTX Video modular pipeline #13378

Open

6 tasks

akshan-main added 2 commits April 2, 2026 17:47

Fix ruff lint and format issues

e439012

use InputParam/OutputParam templates and fix ruff

330c5f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modular pipeline for HunyuanVideo 1.5#13389

Add modular pipeline for HunyuanVideo 1.5#13389
akshan-main wants to merge 14 commits intohuggingface:mainfrom
akshan-main:modular-hunyuan1.5

akshan-main commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akshan-main commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

akshan-main commented Apr 2, 2026 •

edited

Loading