MMaDA is a new class of multimodal diffusion foundation models, enabling state-of-the-art performance in reasoning, multimodal understanding, and text-to-image generation.
📄 Paper | 💻 Code
Number of tokens to generate.
Must be divisible by (gen_length / block_length).
gen_length must be divisible by this.
Classifier-Free Guidance. 0 disables it.
Controls randomness via Gumbel noise. 0 is deterministic.