| Max native clip length | 30 seconds, single pass |
| Resolution tiers | 480p, 720p, 1080p, native 4K |
| Render modes | Fast (quicker, lower cost) or Standard (higher fidelity) |
| Multimodal reference limit | Up to 50 inputs per job |
| Reference file types | Images (jpeg, png, webp, bmp, tiff, gif), video (mp4, mov), audio (mp3, wav) |
| 3D input | White-model blockout mesh for spatial pre-staging |
| Aspect ratios | 16:9, 9:16, 1:1, 4:3 |
| Audio generation | Native, co-processed with video—no separate audio pass |
| Prompt adherence vs 2.0 | ~20% higher on complex, multi-instruction briefs |
| Output format | MP4, browser preview before download |