Reddit is one of the largest forums on the Internet. About 57 million people visit the site every day to discuss a wide range of topics. In recent years, Reddit data has also become a free AI training tool for Google, OpenAI, and Microsoft. These companies use forum discussions in the development of AI systems.
On April 18, Reddit announced plans to start charging companies for access to its API (application programming interface). Steve Huffman, founder and CEO of Reddit, asserted that “Reddit’s data is really valuable” and cannot be given away for free to the richest companies in the world .
Founded in 2005, Reddit makes money primarily through advertising and e-commerce on its platform. The forum is still finalizing the details of its fees and will announce pricing in the coming weeks.
Reddit and similar conversations have become valuable commodities as large language models (LLMs) play a vital role in creating new AI technologies. LLMs are sophisticated algorithms that feed data from Reddit into their training. Services like Google Bard and ChatGPT both use Reddit data.
ChatGPT has many benefits for the company behind it, but it has no benefit for Reddit. In fact, it could be used to create competitors to Reddit. Other companies have also started selling data to AI developers. For example, Shutterstock sold its image data to OpenAI to develop its Dall-E image-to-text program.
Last week, Elon Musk said he would control the use of Twitter’s API, which thousands of companies and independent developers use to monitor millions of conversations on the platform. Fees can range from a few thousand to hundreds of thousands of dollars.
For LLM to continually improve, companies need two things: massive computing power and massive data. Some companies already have massive computing power, but they still look for external data to improve their algorithms. These include sources like Wikipedia , e-books, academic papers, and Reddit.
Huffman believes their data is valuable in part because it’s constantly updated. Freshness and relevance are what big language models need to produce the best results. Reddit’s API remains free to developers who want to write apps for the Reddit community or academics who want to study the data for academic or non-commercial purposes, he said.
According to Huffman, companies collecting data, creating value, but not giving anything back to Reddit users is a problem, so now is a good time to tighten things up.
(According to NYT)
Source
Comment (0)