MMaDA is a new class of multimodal diffusion foundation models, enabling state-of-the-art performance in reasoning, multimodal understanding, and text-to-image generation.

📄 Paper    |    💻 Code

Select Text Generation Model

Part 1. Text Generation

8 1024
1 512
8 1024
Remasking Strategy
0 2
0 2
Examples
Enter your prompt: Total Sampling Steps Generation Length Block Length Temperature CFG Scale Remasking Strategy

Part 2. Multimodal Understanding

64 1024
1 512
32 1024
Remasking Strategy
0 2
0 2

Live Generation Process

Final Generated Text

Examples
Upload Image Enter your prompt: Total Sampling Steps Generation Length Block Length Temperature CFG Scale Remasking Strategy

Part 3. Text-to-Image Generation

5 100
0 7
Scheduler
Examples
Enter your prompt: Total Sampling Steps Guidance Scale Scheduler