Edit videos with natural language — describe what to change and the model handles the rest








Kling O1 is a unified multimodal video model built for editing. Provide a reference video and describe your changes in plain text. The model analyzes the motion, spatial structure, and timing of the original clip, then applies your edits while keeping everything else intact.
Describe edits in plain text — no masking, keyframing, or manual adjustments needed.
Camera paths, body movements, and timing from the original video stay intact during editing.
Use reference images to guide specific visual changes like character replacement or style transfer.
Reference images and videos directly in your prompt using <<<image_1>>> and <<<video_1>>> syntax.
Provide the video you want to edit. MP4, MOV, WebM, M4V, or GIF. 3-10 seconds, max 200MB.
Write what you want to change. Use <<<video_1>>> to reference your video and <<<image_1>>> for reference images in the prompt.
Upload up to 7 reference images to guide visual changes like character appearance or environment style.
Choose Standard or Pro mode. The model applies your edits while preserving the original motion and timing.
Edit videos by describing changes in natural language. No timeline, no masks, no manual frame editing.
The model understands the 3D structure and timing of your video before applying changes.
Use <<<image_1>>> and <<<video_1>>> in prompts to precisely reference uploaded media in your edit instructions.
Provide multiple reference images to guide character appearance, environment, or style changes.
'Feature' mode for style guidance. 'Base' mode for direct editing with full motion preservation.
Input supports MP4, MOV, WebM, M4V, and GIF. Resolution range: 720px to 2160px.

Credits are based on video length, quality mode, and whether you use video input. The generator shows the exact estimate before you create.
10-second edit: 200 credits for basic edits, or 300 credits with video input.
10-second edit: 250 credits for basic edits, or 400 credits with video input.

O1 is built for editing existing video footage without manual post-production tools.
Change outfits, swap backgrounds, adjust lighting, or modify weather conditions in existing footage.
Apply a specific visual style to your video while keeping the original camera work and motion.
Replace characters in a video using reference images while preserving their movements and interactions.
Transform the setting of a video — change seasons, time of day, or location while keeping the action intact.
Edit videos with natural language. Describe your changes and preserve the original motion.
Kling 4.0 is coming soon for 4K+ cinematic AI video from text and images. Native audio, multi-shot sequencing, persistent character identity, and enhanced photorealism are expected in a single generation workflow.
Generate native 4K AI videos with Kling 3.0. Multi-shot sequencing, integrated audio generation, text-to-video and image-to-video — all in a single generation workflow.
Generate and edit AI videos from text, images, and video references with Kling 3.0 Omni. Reference-based character consistency, video-to-video editing, and native audio in one unified model.
Transfer motion from any reference video to a static image with preserved identity and smooth animation
Generate fast, affordable AI videos with Kling O3. Text-to-video, image-to-video, multi-shot sequencing, native audio, and 4K output — at a lower credit cost than Kling 3.0.
Turn any portrait photo into a talking video with Kling Avatar V2. Upload a face image and an audio file — the model generates precise lip sync, natural head motion, and facial expressions at 1080p 48fps.
Generate cinematic AI videos with Kling 2.6. Native audio, accurate lip sync, 1080p output, 5s or 10s duration. The most affordable Kling model for single-shot video with sound.
Control how elements move in your video — paint paths, transfer motion from reference clips, animate up to 6 elements
Generate and edit high-quality AI images with Kling O3. Text-to-image generation and image editing with reference inputs — 1K to 4K resolution, multiple aspect ratios, 5 credits per image.
Generate ultra-fast photorealistic AI images with Nano Banana 2. Text-to-image and image-to-image generation in 1K, 2K, or 4K resolution across a wide range of aspect ratios.