Gradio

MMaDA is a new class of multimodal diffusion foundation models, enabling state-of-the-art performance in reasoning, multimodal understanding, and text-to-image generation.

📄 Paper | 💻 Code

MMaDA-8B-MixCoT (Active) MMaDA-8B-Max (coming soon)

Part 1. Text Generation

Enter your prompt:

Generation Length

Number of tokens to generate.

8 1024

Total Sampling Steps

Must be divisible by (gen_length / block_length).

1 512

Block Length

gen_length must be divisible by this.

8 1024

Remasking Strategy

CFG Scale

Classifier-Free Guidance. 0 disables it.

0 2

Temperature

Controls randomness via Gumbel noise. 0 is deterministic.

0 2

Live Generation Process

Final Output

Examples

Enter your prompt:	Total Sampling Steps	Generation Length	Block Length	Temperature	CFG Scale	Remasking Strategy

Part 2. Multimodal Understanding

Enter your prompt:

Generation Length

Number of tokens to generate.

64 1024

Total Sampling Steps

Must be divisible by (gen_length / block_length).

1 512

Block Length

gen_length must be divisible by this.

32 1024

Remasking Strategy

CFG Scale

Classifier-Free Guidance. 0 disables it.

0 2

Temperature

Controls randomness via Gumbel noise. 0 is deterministic.

0 2

Upload Image

Live Generation Process

Token Sequence (Live Update)

Final Generated Text

Final Output

Examples

Upload Image	Enter your prompt:	Total Sampling Steps	Generation Length	Block Length	Temperature	CFG Scale	Remasking Strategy

Part 3. Text-to-Image Generation

Enter your prompt:

Total Sampling Steps

Must be divisible by (gen_length / block_length).

5 100

Guidance Scale

Classifier-Free Guidance. 0 disables it.

0 7

Scheduler

cosine sigmoid linear

Generated Image

Generation Status

Examples

Enter your prompt:	Total Sampling Steps	Guidance Scale	Scheduler