Creating a 100% AI-Generated Video with Google Veo 3
Mistakes to avoid, current limitations, and practical tips
After spending weeks producing a presentation video for our virtual agent Emmie—entirely generated using AI (specifically Google Veo 3)—one thing became clear: AI video generation, while promising, is still a challenging process.
Here’s a breakdown of what we learned:
what not to do, what still doesn’t work, and how to get the best out of Veo 3.
5 Mistakes to Avoid
1. Underestimating the 8-second limit
Every Veo 3 scene is capped at 8 seconds. Period.
This technical limit has a direct impact on scripting, pacing, and scene transitions. We had to rewrite much of our script to fit this constraint.
Don’t: plan for long shots, continuous dialogue, or complex transitions.
2. Assuming visual consistency
Even with ultra-detailed prompts, faces, gestures, and clothing varied from one scene to the next. Emmie never looked exactly like “Emmie” twice.
Don’t: rely on character continuity without planning for heavy post-production.
3. Recording voiceover too early
Since Veo scenes are silent, the voice needs to be created separately (we used AI for that too). But syncing a pre-recorded voice to unstable visuals is… painful.
Don’t: lock in narration before your visuals are edited and final.
4. Ignoring generation costs
One generation = 100 credits (~$1).
Multiply that by 6–8 scenes × 15–20 variations per scene, and… you get the idea.
Don’t: prompt blindly. Always budget and plan for iteration.
5. Relying on automatic subtitles
Veo adds subtitles directly into the video with no option to turn them off. If you don’t want them, you’ll need to manually remove or mask them.
Don’t: enable subtitles unless you’re 100% sure you’ll use them.
The Current Limits of AI Video Generation
-
Visual inconsistencies: lighting, proportions, character traits vary between scenes
-
Unnatural gestures: body movement often feels rigid or robotic
-
Limited control: camera angles, framing, action direction are unpredictable
-
No built-in audio: visuals are silent; sound must be added separately
-
Low personalization: keeping a character visually stable across scenes is very difficult
Our Top Tips for Working with Veo 3
🎬 1. Stick to 2–3 second shots
You don’t need to use all 8 seconds. The most usable clips are often short and focused.
✍️ 2. Find the right prompt balance
Too vague = chaos. Too specific = rigidity. Experiment until you find the sweet spot.
♻️ 3. Reuse and iterate on good prompts
Once you find a tone or aesthetic that works, duplicate it and adjust one variable at a time.
🧩 4. Plan for serious editing
Think in fragments. You’ll likely need to cut, crop, re-sequence, and mask your way to coherence.
🎛 5. Build your audio layer separately
Since visuals come without sound, plan a parallel audio workflow: narration, music, sound design, etc.
In Summary
Creating a video with Google Veo 3 is both exciting and frustrating.
You need to balance technical constraints, creative instability, and unexpected costs. But with patience and structure, it’s possible to produce a surprisingly coherent result—without a single actor, camera, or studio.