Skip to content

feat(qwenimage): add modular area composition denoise route#13097

Open
yaoqih wants to merge 1 commit intohuggingface:mainfrom
yaoqih:feat/qwen-area-composition
Open

feat(qwenimage): add modular area composition denoise route#13097
yaoqih wants to merge 1 commit intohuggingface:mainfrom
yaoqih:feat/qwen-area-composition

Conversation

@yaoqih
Copy link
Contributor

@yaoqih yaoqih commented Feb 8, 2026

What does this PR do?

This PR adds modular area composition routing for QwenImage in a way that follows the modular diffusers philosophy: compose new blocks instead of mutating existing default workflows.

Summary of changes

  • Adds two new area-specific core denoise workflows:
    • QwenImageAreaCompositionCoreDenoiseStep (text2image + area composition)
    • QwenImageAreaCompositionImg2ImgCoreDenoiseStep (img2img + area composition)
  • Extends QwenImageAutoCoreDenoiseStep routing with an explicit area_composition trigger:
    • uses area route only when area_composition is non-empty
    • preserves existing routing for text2image / img2img / inpaint / controlnet when area composition is not enabled
  • Keeps the default non-area denoise path intact (backward-compatible behavior).
  • Adds/updates area composition plumbing in QwenImage modular components (inputs, encoders, and denoise path) to support regional conditioning and weighted regional merging.
  • Exports newly added area core blocks in qwenimage/__init__.py.

Motivation

The goal is to support area composition as an opt-in capability without replacing default denoise behavior globally. This keeps the architecture composable and explicit, and better aligns with modular pipeline design principles.

Backward compatibility

  • Existing workflows remain unchanged unless area_composition is provided.
  • No behavior change for standard text2image/img2img/inpaint/controlnet usage.

Fixes # (issue)

Before submitting

Who can review?

This PR touches modular pipelines / denoising route selection for QwenImage.

Suggested reviewers:

test code:

import torch
import diffusers
from diffusers import QwenImageModularPipeline
from diffusers.modular_pipelines.qwenimage import QwenImageAutoBlocks


cfg = {
    "prompt": "(masterpiece) (best quality) beautiful landscape breathtaking amazing view nature photograph forest mountains ocean (sky) national park scenery, full body (flat chest:1.0) (girl:1.0) with (fennec fox:0.9) (ears:1.0) (short blonde:1.0) hair (blue eyes:1.0) school uniform sweater standing long skirt",
    "negative_prompt": "(hands), text, error, cropped, (worst quality:1.2), (low quality:1.2), normal quality, (jpeg artifacts:1.3), signature, watermark, username, blurry, artist name, monochrome, sketch, censorship, censor, (copyright:1.2), extra legs, (forehead mark) (depth of field) (emotionless) (penis) (pumpkin)",
    "area_composition": [
        {
            "prompt": "(best quality) (night:1.3) (darkness) sky (black) (stars:1.2) (galaxy:1.2) (space) (universe)",
            "x": 0.0,
            "y": 0.0,
            "width": 1.0,
            "height": 0.3,
            "strength": 3,
        },
        {
            "prompt": "(best quality) (evening:1.2) (sky:1.2) (clouds) (colorful) (HDR:1.2) (sunset:1.3)",
            "x": 0.0,
            "y": 0.25,
            "width": 1.0,
            "height": 0.3,
            "strength": 3,
        },
        {
            "prompt": "(best quality) (daytime:1.2) sky (blue)",
            "x": 0.0,
            "y": 0.4,
            "width": 1.0,
            "height": 0.3,
            "strength": 3,
        },
        {
            "prompt": "(masterpiece) (best quality) morning sky",
            "x": 0.0,
            "y": 0.55,
            "width": 1.0,
            "height": 0.3,
            "strength": 3,
        },
    ],
    "seed": 853374162509361,
    "num_inference_steps": 13,
    "width": 704,
    "height": 1280,
}

MODEL_PATH = "/gemini/platform/public/aigc/cv_banc/zsw/zhuangcailin/pretrain/Qwen/Qwen-Image-2512/"
DEVICE = "cuda:1"

blocks = QwenImageAutoBlocks()

pipe = QwenImageModularPipeline(blocks, pretrained_model_name_or_path=MODEL_PATH)
pipe.load_components(
    names=[name for name in pipe.pretrained_component_names if name != "controlnet"],
    torch_dtype={"default": torch.bfloat16, "vae": torch.float16},
)
pipe.to(DEVICE)

with_area = pipe(
    prompt=cfg["prompt"],
    negative_prompt=cfg["negative_prompt"],
    width=cfg["width"],
    height=cfg["height"],
    num_inference_steps=cfg["num_inference_steps"],
    area_composition=cfg["area_composition"],
    generator=torch.Generator(device=DEVICE).manual_seed(cfg["seed"]),
    output=["images"],
)["images"][0]

without_area = pipe(
    prompt=cfg["prompt"],
    negative_prompt=cfg["negative_prompt"],
    width=cfg["width"],
    height=cfg["height"],
    num_inference_steps=cfg["num_inference_steps"],
    area_composition=[],
    generator=torch.Generator(device=DEVICE).manual_seed(cfg["seed"]),
    output=["images"],
)["images"][0]

with_area.save("with_area_composition.png")
without_area.save("without_area_composition.png")

with_area_composition.png(strength=3)
with_area_composition

with_area_composition.png(strength=1.5)
with_area_composition

without_area_composition.png
without_area_composition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant