Just one day after OpenAI introduced GPT-5, two AI security companies, NeuralTrust and SPLX (formerly SplxAI), tested and quickly discovered serious vulnerabilities in the newly released model.
Shortly after its release, the NeuralTrust team used a jailbreak technique called EchoChamber combined with a storytelling technique to get GPT-5 to generate detailed instructions for building a Molotov cocktail — something the OpenAI team had always tried to prevent the model from answering to ensure the chatbot's safety.

EchoChamber is a third-party conversation looping technique that allows AIs to unwittingly "narrate" dangerous instructions. Photo: Mojologic
The team said that during the jailbreak process to coax ChatGPT-5 into swearing, they did not ask any direct questions, but instead cleverly planted hidden elements in the conversation over multiple turns, causing the model to be led, stick to the story line, and eventually voluntarily provide content that violated its principles without being able to trigger the opt-out mechanism.
The team concluded that a major drawback of GPT-5 is that it prioritizes maintaining the consistency of conversational context, even if that context is silently steered toward malicious goals.
Meanwhile, SPLX launched a different type of attack, focusing on a prompt obfuscation technique called StringJoin Obfuscation Attack. By inserting hyphens between each character of the prompt and covering the entire script with a “decryption” script, they finally managed to fool the content filtering system.

The common Obfuscation technique used to blind the source code target makes Chat-GPT execute "innocently".
In one example, after the model was led through a lengthy series of instructions, the question “how to build a bomb” was presented in a pseudo-encoded form. GPT-5 not only answered this malicious question informatively, but also responded in a witty, friendly manner, completely bypassing the opt-out mechanism for which it was designed.
Both methods demonstrate that GPT-5’s current moderation systems, which focus primarily on single prompts, are vulnerable to contextually-enhanced multi-talk attacks. Once the model has delved deep into a story or hypothetical scenario, it becomes biased and will continue to deploy content that fits the entrapped context, regardless of whether the content is dangerous or prohibited.

ChatGPT-5 can still be exploited to create dangerous things. Photo: Tue Minh
Based on these results, SPLX believes that GPT-5, if not customized, would be nearly impossible to use safely in a corporate environment, and even with additional layers of protection, it would still have many loopholes. In contrast, GPT-4o is still more resilient to such attacks, especially when a tight defense mechanism is set up.
Experts have warned that putting GPT-5 into practice right away, especially in areas that require high security, is extremely risky. Protection techniques such as prompt hardening only solve part of the problem and cannot replace multi-layered real-time monitoring and defense solutions.
It can be seen that currently, context-based attack techniques and content obfuscation are increasingly sophisticated, GPT-5, although powerful in language processing capabilities, still does not reach the level of security needed for widespread deployment without additional protection mechanisms.
Source: https://khoahocdoisong.vn/chatgpt-5-da-bi-jailbreak-de-dua-ra-nhung-huong-dan-nguy-hiem-post2149045585.html
Comment (0)