An investigation by Proof News revealed that these companies used a dataset created by the non-profit organization EleutherAI, containing recordings of YouTube video content from over 48,000 channels, without obtaining permission from the owners or content creators.
Although the dataset does not contain images or videos, the content is sourced from leading content creators on the platform, such as Marques Brownlee and MrBeast, as well as major news publishers like The New York Times, BBC, and ABC News. Additionally, it includes subtitles from videos belonging to Engadget.

"Apple gets data for its AI from a number of companies," Brownlee, a popular YouTuber, posted on X. "One of them is tons of data/records from YouTube videos, including mine."
Previously, YouTube CEO Neal Mohan asserted that companies using YouTube's data to train AI models violated the platform's terms and services.
Currently, AI companies are still not transparent about the data used to train their algorithms. Earlier this month, artists and photographers criticized Apple for not disclosing the data sources used to train Apple Intelligence – a new AI feature that will be available on millions of Apple devices this year.
YouTube, the world's largest video hosting platform, is also a data "gold mine" for training AI, as it includes recordings, audio, video, and images.
Earlier this year, OpenAI's chief technology officer, Mira Murati, dodged questions from The Wall Street Journal about whether the company used YouTube videos to train Sora, OpenAI's upcoming AI video creation tool.
“I won’t go into detail about the data that was used, but it was licensed or publicly available data,” Murati said at the time. Meanwhile, Alphabet CEO Sundar Pichai also stressed that companies using data from YouTube to train AI models were violating the platform’s terms of service.
(According to Proof News, WSJ)
Source: https://vietnamnet.vn/apple-nvidia-va-anthropic-su-dung-trai-phep-du-lieu-youtube-de-dao-tao-ai-2303028.html






Comment (0)