In a move seen as protecting the company's intellectual property, OpenAI has just implemented a government ID verification requirement for developers seeking access to its most advanced Artificial Intelligence (AI) models.
OpenAI has not responded to a request for comment on the above information.
However, in the announcement, the "creator" of the ChatGPT application explained that the reason for introducing the new verification process was because some developers intentionally used OpenAI's application programming interfaces (APIs), thereby violating the company's usage policy.
Although the company officially stated the reason was to prevent misuse, the action appears to stem from a deeper concern: that the output from OpenAI's models is being collected by competitors to train their own AI systems.
This is demonstrated by a new study from Copyleaks, a company specializing in AI content detection.
Using a "fingerprint" recognition system similar to that of large AI models, Copyleaks discovered that approximately 74% of the output from the rival model DeepSeek-R1 (China) could be classified as written by OpenAI. This figure not only indicates duplication but also imitation.
Copyleaks also examined other AI models such as Microsoft's phi-4 and xAI's Grok-1. The results showed almost no similarity to OpenAI, with "disagreement" rates of 99.3% and 100%, respectively. Mistral's Mixtral model did have some similarities.
This study highlights a fact: Even when models are asked to write with different tones or formats, they still leave behind detectable stylistic signatures—similar to linguistic fingerprints.
These fingerprints persist across different tasks, topics, and prompts, and can be traced back to their origin with a certain degree of accuracy.
Meanwhile, some critics point out that OpenAI itself built its initial models by gathering data from the web, including content from news publishers, authors, and creators—often without their consent.
Copyleaks CEO Alon Yamin pointed out two problems: training human models on copyrighted content without permission and using the output of proprietary AI systems to train competing models – essentially reverse-engineering a competitor's product.
Yamin argues that while both methods are ethically controversial, training on OpenAI's output poses a competitive risk, as it essentially exploits hard-to-achieve innovations without the original developer's consent or compensation.
As AI companies race to build increasingly powerful models, the debate over who owns what and who can train on what data is becoming more intense.
Tools like Copyleaks' digital fingerprinting system offer a potential method for tracking and verifying copyright in patterns.
Source: https://www.vietnamplus.vn/openai-siet-chat-kiem-soat-de-ngan-cac-doi-thu-sao-chep-mo-hinh-tri-tue-nhan-tao-post1033664.vnp






Comment (0)