Vietnam.vn - Nền tảng quảng bá Việt Nam

Reddit updates protocol to prevent AI from stealing content

Công LuậnCông Luận26/06/2024


The move comes as artificial intelligence companies are being accused of stealing content from publishers to train AI or summarizing information, including copyrighted articles, to respond to users without paying or even asking for permission.

reddit update protocol shortcode ai cap content image 1

Photo: Reuters

Reddit said it will update its Robots Exclusion Protocol, or "robots.txt," a widely accepted standard for defining which parts of a website are allowed to be crawled.

The company also said it will maintain rate limiting, a technique used to control the number of requests from a particular entity, and will block unknown bots and crawlers from collecting data on its site.

Robots.txt is an important tool that publishers, including news organizations, use to prevent tech companies from illegally scraping their content to train AI or create summaries to answer certain search queries.

Last week, content licensing startup TollBit revealed in a report that some AI companies are bypassing rules to scrape content on publishers' websites.

This comes after a Wired investigation found that AI search startup Perplexity may have broken rules to block web crawlers via robots.txt.

Earlier in June, media publisher Forbes also accused Perplexity of plagiarizing its investigative articles, for use in generative AI systems without attribution.

Reddit said Tuesday that researchers and organizations like the Internet Archive will continue to have access to its content for non-commercial purposes.

Hoang Hai (according to Reuters)



Source: https://www.congluan.vn/reddit-cap-nhat-giao-thuc-ngan-chan-ai-danh-cap-noi-dung-post300804.html

Comment (0)

No data
No data

Heritage

Figure

Business

No videos available

News

Political System

Local

Product