Generate and edit videos from text, images, and video references in one unified model
Kling 3.0 Omni is a unified multimodal video model that accepts text, images, and video as input. It combines text-to-video, image-to-video, reference-based generation, and video editing into a single pipeline with native audio output.
Feed text prompts, reference images (up to 7), and existing video clips into one model.
Edit existing videos by providing a reference clip and describing changes in natural language.
Maintain the same character appearance across multiple shots and scenes using reference images.
Apply visual styles from reference images or videos to your generated content.
Start with a text prompt, upload reference images for style consistency, or provide a video for editing.
Upload up to 7 reference images for character or style consistency. Add a reference video (3-10s, max 200MB) for video editing.
Choose Standard or Pro mode, aspect ratio (16:9, 9:16, 1:1), duration (3-15s), and whether to generate audio.
The model processes all inputs together and outputs video with optional synchronized audio.
Upload up to 7 reference images (4 when combined with video) to guide character appearance and visual style.
Provide a reference video and describe edits. The model preserves motion while applying your changes.
Use 'feature' mode for style transfer or 'base' mode for direct video editing with motion preservation.
Create up to 6 connected shots with consistent characters across the entire sequence.
Generate synchronized audio including dialogue, sound effects, and ambient sounds.
When editing video, optionally keep the original audio track from your reference clip.
Per-second pricing based on mode selection.
$0.112 per second. A 10-second video costs $1.12.
$0.168 per second with higher quality output. A 10-second video costs $1.68.
Omni is the right choice when you need reference-based control or video editing capabilities.
Upload brand reference images to generate videos that match your visual identity across campaigns.
Keep the same character across multiple scenes by providing reference images of your subject.
Transform existing footage into different visual styles while preserving the original motion and timing.
Generate multiple versions of product videos with different styles or settings from the same reference.
Generate and edit videos with multimodal references and style control.
Next-generation AI video model — Coming Soon
Native 4K, multi-shot sequencing & integrated audio
Transfer motion from reference video to image
Natural language video editing with motion preservation
Talking head generation with audio lip-sync at 48fps
Cinematic motion with native audio & lip-sync
Motion path control with up to 6 independent elements
Ultra-fast photorealistic image generation