Generate and edit AI videos from text, images, and video references with Kling 3.0 Omni. Reference-based character consistency, video-to-video editing, and native audio in one unified model.








Kling 3.0 Omni is a unified multimodal AI video model that accepts text, images, and existing video as input. It combines text-to-video generation, image-to-video animation, reference-based style consistency, and video-to-video editing into a single pipeline. Kling 3.0 Omni is designed for workflows that require reference-based control over character appearance, visual style, or existing footage transformation.
Kling 3.0 Omni accepts text prompts, up to 7 reference images for character or style consistency, and existing video clips for editing or style transfer. You can combine all three input types in a single Kling 3.0 Omni generation.
Upload an existing video as a reference and describe the changes you want in natural language. Kling 3.0 Omni preserves the original camera movement and timing while applying your edits — transforming scenes, changing visual style, or adjusting content without rebuilding from scratch.
Upload reference images of your character and Kling 3.0 Omni maintains consistent appearance, clothing, and features across all generated shots. Useful for brand mascots, recurring characters, and multi-scene content where identity consistency matters.
Kling 3.0 Omni generates synchronized audio alongside video when using text or image input. Sound effects, ambient audio, and dialogue are matched to the visual content automatically in a single generation pass.
Start with a text prompt for full creative generation, upload reference images for character or style consistency, or provide a video clip for editing or style transfer.
Upload up to 7 reference images for character or visual style guidance. If editing video, add a reference clip (3-10 seconds, MP4 or MOV, max 200MB) and choose Feature or Base mode.
Select Standard or Pro mode, choose aspect ratio (16:9, 9:16, or 1:1), set duration (3-15 seconds), and enable audio if not using a reference video.
Kling 3.0 Omni processes all inputs together and outputs video with optional synchronized audio. Download the watermark-free MP4 result.
Feed up to 7 reference images into Kling 3.0 Omni to guide character appearance and visual style. When combining with a reference video, up to 4 reference images can be used simultaneously.
Feature mode uses your reference video as a style guide while generating new motion. Base mode treats the video as a direct editing foundation, preserving original movement and timing while applying prompt-guided changes.
Kling 3.0 Omni supports up to 6 connected shots with consistent characters and visual continuity across the entire sequence — the same multi-shot capability as Kling 3.0.
Generate synchronized sound effects, dialogue, and ambient audio alongside the video. Audio generation is available when using text-to-video or image-to-video input modes.
When editing an existing video with Kling 3.0 Omni, you can preserve the original audio track from your reference clip in the final output.
Standard mode outputs at 720p for faster generation. Pro mode outputs at 1080p with higher visual detail and motion fidelity — suitable for commercial and professional use.
Kling 3.0 Omni uses the platform credit system. Credits are charged per second based on quality mode and audio setting. The estimated cost is shown before each generation.
45 credits per second without audio, or 60 credits per second with audio. A 10-second Kling 3.0 Omni Standard video costs 450 credits without audio or 600 credits with audio.
60 credits per second without audio, or 70 credits per second with audio. A 10-second Kling 3.0 Omni Pro video costs 600 credits without audio or 700 credits with audio.
Kling 3.0 Omni is the right model when your workflow requires reference-based control, video editing, or consistent character generation across multiple scenes.
Upload brand reference images to generate marketing videos that match your visual identity. Kling 3.0 Omni extracts character features and visual style from reference images and applies them consistently across generated scenes.
Keep the same character across multiple shots and scenes by providing reference images. Kling 3.0 Omni maintains consistent facial features, clothing, and proportions throughout the generation — useful for recurring characters in ads, stories, or social content.
Transform existing footage into a different visual style while preserving the original motion and timing. Kling 3.0 Omni can convert realistic footage to animated style, change scene lighting, or apply visual effects based on your prompt and reference inputs.
Generate multiple versions of a product video or campaign clip with different styles, backgrounds, or visual treatments — all using the same reference material as the starting point with Kling 3.0 Omni.
Generate and edit AI videos with reference-based control using Kling 3.0 Omni. Multimodal input, character consistency, style transfer, and native audio — all in one model.
Kling 4.0 is coming soon for 4K+ cinematic AI video from text and images. Native audio, multi-shot sequencing, persistent character identity, and enhanced photorealism are expected in a single generation workflow.
Generate native 4K AI videos with Kling 3.0. Multi-shot sequencing, integrated audio generation, text-to-video and image-to-video — all in a single generation workflow.
Transfer motion from any reference video to a static image with preserved identity and smooth animation
Generate fast, affordable AI videos with Kling O3. Text-to-video, image-to-video, multi-shot sequencing, native audio, and 4K output — at a lower credit cost than Kling 3.0.
Turn any portrait photo into a talking video with Kling Avatar V2. Upload a face image and an audio file — the model generates precise lip sync, natural head motion, and facial expressions at 1080p 48fps.
Generate cinematic AI videos with Kling 2.6. Native audio, accurate lip sync, 1080p output, 5s or 10s duration. The most affordable Kling model for single-shot video with sound.
Control how elements move in your video — paint paths, transfer motion from reference clips, animate up to 6 elements
Generate and edit high-quality AI images with Kling O3. Text-to-image generation and image editing with reference inputs — 1K to 4K resolution, multiple aspect ratios, 5 credits per image.
Generate ultra-fast photorealistic AI images with Nano Banana 2. Text-to-image and image-to-image generation in 1K, 2K, or 4K resolution across a wide range of aspect ratios.