AI is secretly rating humans.

Instead of humans evaluating AI as before, Anthropic has reversed the process. Claude will analyze users' chat history to score their "level" of AI usage.

ZNews•31/05/2026

Chabot Claude is assessing user proficiency based on interactions. Image: VectorStock .

Anthropic's latest research, titled "AI Fluency Index," has reversed the conventional wisdom by having the chatbot Claude rate humans. By analyzing the structure of conversations, the AI ranks users' proficiency on an 11-point scale.

To develop the competency framework comprising 24 standards, Anthropic used analytical tools to scan 9,830 real-life user conversations.

Of these, 13 criteria occur outside the screen, such as whether users conceal their AI usage from their superiors. The remaining 11 criteria are user behavior metrics, divided into three major aspects: description, authorization, and identification.

The prevalence of each behavioral indicator in AI interactions across 9,830 conversations with Claude. Image: Anthropic.

First, there's the way the request is described, where users must demonstrate a genuine understanding of what they want. Instead of giving vague commands, high-scoring individuals always clearly state the ultimate goal and explain the context in detail. They also provide very specific requirements regarding presentation style, such as asking the AI to create tables or limiting the number of words. Notably, this group often includes several sample essays as examples for the AI to "mimic" the correct style from the start.

The second aspect is the way tasks are delegated. Research shows that skilled users treat AI as a discussion partner, not a mindless machine. The biggest difference lies in persistence. Instead of giving a command once and for all, they engage in multiple rounds of back-and-forth conversations to refine and have the AI revise its answers until they are completely satisfied. This behavior occurs in 85.7% of high-quality conversations.

The final aspect is recognition, acting as a filter to prevent humans from being misled by the information provided by chatbots. Users need to constantly question the logic of the reasoning, ask the AI to explain each line of code, or request clear citations. They also need to be perceptive enough to identify missing context in the AI's solution in order to make timely assessments and adjustments to the conclusions.

Experienced users typically receive a score of around 7-8 from Clade. Photo: X.

However, the research also points to a worrying psychological trap, known as the "Beautiful Interface Paradox." When Claude's Artifacts feature creates visually appealing products such as a smooth piece of code or a perfect diagram, our brains immediately tend to become "lazy thinkers" and stop critical thinking.

The study's statistics show that when users see a polished interface, the percentage of them actively searching for flaws immediately decreases by 5.2%. The ability to verify the authenticity of information also decreases by 3.7%, and the percentage of those doubting its logic decreases by 3.1%.

"If something looks perfect, users will automatically assume it's correct," experts at Anthropic noted.

This subjective approach is extremely dangerous. In fact, the more complex the task, the higher the chance that AI will make mistakes or "fabricate" information. If humans judge internal quality based solely on appearances, we will be very easily deceived by AI.

According to the report, those who regularly engage in back-and-forth conversations and point out AI flaws are rated 5-6 times higher than average users. They are also more likely to spot shortcomings and inconsistencies compared to the rest of the user group. These "experts" typically achieve scores of around 7-8/11 from Claude.

Source: https://znews.vn/ai-dang-ngam-cham-diem-con-nguoi-post1655559.html