Introduction
Veo 3.1 is Google's flagship AI video model, available on invideo. It generates high-fidelity cinematic video up to 4K from text prompts or images, with character consistency, native synchronized audio, and precise cinematic control over camera, lens, lighting, and style. It's ideal for promos, social media content, product demos, and professional-looking AI video without traditional filming.
This article covers how to access Veo 3.1, creation modes, prompt best practices, and available settings.

How to get started
From the homepage, click → Generative models → See all → select Veo 3.1
Upload a reference image or start from a text prompt alone
Optionally add a first and last frame to control how your video opens and closes
Select your resolution and duration, confirm the credit cost, and generate
Download your video, watermark free on all paid plans
Spec information
Minimum duration | 4 seconds |
Maximum duration | 8 seconds |
Resolution | 720p – 4K |
Aspect ratios | 16:9, 9:16 |
Input types | Text, image (single or up to 3) |
Prompt best practices
Veo 3.1 responds well to the 7-layer prompt formula:
Camera/Lens → Subject → Action → Environment → Lighting → Style → Audio
Example: Wide tracking shot, a woman in a red coat walks through a foggy cobblestone street at dawn, warm lamplight, cinematic film texture, ambient city sounds.

💡 Tips for better results | Why it helps |
Follow the 7-layer formula | Gives Veo 3.1 every dimension it needs to generate a precise, polished scene |
Specify camera and lens details | Veo 3.1 reads cinematic language directly, dolly, tracking, wide angle, close-up |
Include audio direction in your prompt | Veo 3.1 generates synchronized sound natively, describe ambient sounds, music tone, or dialogue presence |
Upload a reference image for consistency | Anchors character appearance and scene style across generations |
Use first and last frame for transitions | Particularly useful for multi-scene projects where visual continuity matters |
