Edit videos with natural language — describe what to change and the model handles the rest
Kling O1 is a unified multimodal video model built for editing. Provide a reference video and describe your changes in plain text. The model analyzes the motion, spatial structure, and timing of the original clip, then applies your edits while keeping everything else intact.
Describe edits in plain text — no masking, keyframing, or manual adjustments needed.
Camera paths, body movements, and timing from the original video stay intact during editing.
Use reference images to guide specific visual changes like character replacement or style transfer.
Reference images and videos directly in your prompt using <<<image_1>>> and <<<video_1>>> syntax.
Provide the video you want to edit. MP4, MOV, WebM, M4V, or GIF. 3-10 seconds, max 200MB.
Write what you want to change. Use <<<video_1>>> to reference your video and <<<image_1>>> for reference images in the prompt.
Upload up to 7 reference images to guide visual changes like character appearance or environment style.
Choose Standard or Pro mode. The model applies your edits while preserving the original motion and timing.
Edit videos by describing changes in natural language. No timeline, no masks, no manual frame editing.
The model understands the 3D structure and timing of your video before applying changes.
Use <<<image_1>>> and <<<video_1>>> in prompts to precisely reference uploaded media in your edit instructions.
Provide multiple reference images to guide character appearance, environment, or style changes.
'Feature' mode for style guidance. 'Base' mode for direct editing with full motion preservation.
Input supports MP4, MOV, WebM, M4V, and GIF. Resolution range: 720px to 2160px.
Per-second pricing. Typical edits cost $0.50-$1.68.
$0.168 per second. A 5-second edit costs $0.84. A 10-second edit costs $1.68.
$0.112 per second. A 5-second edit costs $0.56. A 10-second edit costs $1.12.
O1 is built for editing existing video footage without manual post-production tools.
Change outfits, swap backgrounds, adjust lighting, or modify weather conditions in existing footage.
Apply a specific visual style to your video while keeping the original camera work and motion.
Replace characters in a video using reference images while preserving their movements and interactions.
Transform the setting of a video — change seasons, time of day, or location while keeping the action intact.
Edit videos with natural language. Describe your changes and preserve the original motion.
Next-generation AI video model — Coming Soon
Native 4K, multi-shot sequencing & integrated audio
Multimodal generation, video editing & style consistency
Transfer motion from reference video to image
Talking head generation with audio lip-sync at 48fps
Cinematic motion with native audio & lip-sync
Motion path control with up to 6 independent elements
Ultra-fast photorealistic image generation