Vietnam.vn - Nền tảng quảng bá Việt Nam

Veo 3's Big Problem

This AI model is automatically inserting gibberish into videos more than a month after its launch, showing that Google is willing to release unfinished products to demonstrate its AI capabilities.

ZNewsZNews19/07/2025

Veo3 is Google's latest AI model launched in late May, allowing it to generate videos based on commands. This model has attracted the attention of the content creation community because it allows it to create videos with sound and dialogue, a feature not available in Google's previous model version, thus making it more realistic.

Many users use Veo 3 videos, up to 8 seconds long, to create commercials, ASMR videos, fantasy movie trailers, and humorous street interviews.

Oscar-nominated director Darren Aronofsky used the tool to create a short film called Ancestra. During a press conference, Google DeepMind CEO Demis Hassabis compared Veo 3 to a move away from the silent era in cinema.

"Persistent" subtitles from Veo 3

However, many users have found that the tool doesn’t work as expected. When creating clips with dialogue, Veo 3 often automatically inserts meaningless, messy subtitles, even when the command explicitly states not to add subtitles.

Removing these subtitles is not easy. Users are forced to recreate the clip, spend “tokens” which means more money for Google, or use an external tool to remove the subtitles, or crop the video to remove the subtitles.

video AI anh 1

Veo 3 produces lifelike images, dialogue matches mouth movements, but subtitles are meaningless. Photo: Lesswrong .

Josh Woodward, vice president of Google Labs and Gemini, posted on X on June 9 that Google had developed patches to reduce spam. But more than a month later, users continue to report the issue on the Google Labs Discord channel, showing that fixing bugs in large AI models is not easy.

Like Google’s previous video-generating AI models, Veo 3 is a paid model, starting at $249.99 per month. To create an 8-second video, users enter a description into Flow, Gemini, or another platform. Each clip created with Veo 3 costs a minimum of 20 AI credits, and users can top up for $25 for 2,500 credits.

Mona Weiss, a commercial director, said recreating footage to remove subtitles was becoming a significant expense. “If you create a spoken scene with Veo3, about 40% of the output will have nonsensical subtitles that make the video unusable,” she said. “It’s a lot of money to get a scene that you like, and it’s not usable.”

video AI anh 2

Nonsensical subtitles are hard to remove on Veo 3. Photo: Technology Review .

When Weiss reported the issue to Google Labs via Discord in hopes of getting a refund for the wasted credits, the support team transferred her to the company's official support department. They offered to refund the cost of the Veo 3 subscription, but not the credits. Weiss refused because accepting the refund would mean losing access to the model.

Google Labs' Discord support team said that captions can be automatically enabled if speech is detected, and they're working on a fix.

The problem with Google's approach

The reason Veo 3 automatically inserts captions comes from the data the model is trained on.

While Google did not disclose the data categories used to train the model, it likely included videos from YouTube and TikTok, many of which have captions embedded directly into the frame, making them difficult to remove before being used as training data, according to Shuo Niu, a researcher on video sharing platforms and AI at Clark University in Massachusetts.

“Text-to-video models are trained using reinforcement learning to generate content that mimics human-generated videos, and if those videos have subtitles, the model can ‘learn’ that adding subtitles makes the product more like human-generated videos,” he explains.

video AI anh 3

Veo 3 is affected by model training data from YouTube and TikTok videos. Photo: Mashable .

“We are constantly improving our video creation capabilities, especially around text, natural speech, and perfectly synchronized audio,” a Google spokesperson said. “We encourage users to retry their commands if they see inconsistent results and to give us feedback by liking or disliking the results.”

Additionally, the reason the model ignores instructions like “No subtitles” is because negative statements (asking the AI not to do something) are often less effective than positive prompts, according to Tuhin Chakrabarty, a researcher in AI systems at Stony Brook University.

To fully fix the problem, Google would have to examine every frame of all the videos it used to train Veo 3, then remove or relabel the videos with captions before retraining the model, which would take weeks, Chakrabarty added.

Katerina Cizek, a documentary filmmaker and art director at the MIT Open Documentary Lab, says the issue shows Google is still willing to release products that aren't quite finished yet.

“Google needs a win,” Cizek said. “They need to be the first to release a tool that can match the sound of their lips. And that’s more important than fixing the captioning problem.”

Source: https://znews.vn/van-de-lon-cua-veo-3-post1569402.html


Comment (0)

No data
No data
PIECES of HUE - Pieces of Hue
Magical scene on the 'upside down bowl' tea hill in Phu Tho
3 islands in the Central region are likened to Maldives, attracting tourists in the summer
Watch the sparkling Quy Nhon coastal city of Gia Lai at night
Image of terraced fields in Phu Tho, gently sloping, bright and beautiful like mirrors before the planting season
Z121 Factory is ready for the International Fireworks Final Night
Famous travel magazine praises Son Doong cave as 'the most magnificent on the planet'
Mysterious cave attracts Western tourists, likened to 'Phong Nha cave' in Thanh Hoa
Discover the poetic beauty of Vinh Hy Bay
How is the most expensive tea in Hanoi, priced at over 10 million VND/kg, processed?

Heritage

Figure

Business

No videos available

News

Political System

Local

Product