Hackers used AI to attack Google's Gemini.

According to BGR , a new research report has just revealed an alarming technique called 'Fun-Tuning,' which uses AI (artificial intelligence) to automatically generate highly effective prompt injection attacks targeting other advanced AI models, including Google's Gemini.

This method makes 'hacking' AI faster, cheaper, and easier than ever before, marking a new escalation in the cybersecurity battle involving AI.

The danger of malicious actors using AI to break AI.

Prompt injection is a technique where malicious actors stealthily insert harmful instructions into the input data of an AI model (e.g., through comments in source code, hidden text on the web). The goal is to 'trick' the AI, forcing it to ignore pre-programmed safety rules, leading to serious consequences such as leaking sensitive data, providing misinformation, or performing other dangerous actions.

Hacker đang dùng chính AI để tấn công Gemini của Google - Ảnh 1. — Hackers are using AI to attack AI.

Previously, successfully executing these attacks, especially on 'closed' models like Gemini or GPT-4, often required a great deal of complex and time-consuming manual testing.

But Fun-Tuning completely changed the landscape. Developed by a team of researchers from multiple universities, this method cleverly exploits the refined application programming interface (API) that Google provides free of charge to Gemini users.

By analyzing the subtle responses of the Gemini model during the tuning process (for example, how it responds to errors in the data), Fun-Tuning can automatically identify the most effective 'prefixes' and 'suffixes' to mask a malicious statement. This significantly increases the likelihood that the AI will comply with the attacker's malicious intent.

Test results show that Fun-Tuning achieved a success rate of up to 82% on some versions of Gemini, a figure far superior to the less than 30% achieved by traditional attack methods.

What adds to the danger of Fun-Tuning is its extremely low execution cost. Because Google's tuning API is provided free of charge, the computational cost to create an effective attack can be as low as $10. Furthermore, researchers have found that an attack designed for one version of Gemini can easily be successfully applied to other versions, opening up the risk of widespread attacks.

Google has confirmed it is aware of the threat posed by the Fun-Tuning technique but has not yet commented on whether it will change how the tuning API works. The research team also pointed out the difficulty in defending against this: if the information exploited by Fun-Tuning is removed from the tuning process, the API will become less useful to legitimate developers. Conversely, if it remains unchanged, it will continue to be a springboard for malicious actors to exploit.

The emergence of Fun-Tuning is a clear warning, indicating that the confrontation in cyberspace has entered a new, more complex phase. AI is now not only a target but also a tool and weapon in the hands of malicious actors.

Source: https://thanhnien.vn/hacker-dung-ai-de-tan-cong-gemini-cua-google-18525033010473121.htm