Skip to content

Majority of generative AIs are bugged: Potential info leak to hackers

  • by
  • 5 min read

Photo: Tada Images /

A significant vulnerability in generative AIs allows attackers to decipher responses from major AI assistants with alarming accuracy. This revelation comes at a time when AI assistants have become ubiquitous, handling sensitive conversations about everything from personal health issues to business secrets.

The attack, devised by researchers from Ben-Gurion University, Yisroel Mirsky, Roy Weiss, Daniel Ayzenshtyen, and Guy Amit, exploits a side channel in most AI assistants, excluding Google Gemini. This side channel, termed the ‘token-length sequence,’ leverages the real-time transmission of tokens used by AI assistants to deliver responses word by word, exposing a previously unknown vulnerability.

By analysing the size and sequence of tokens, attackers can infer specific phrases and sentences with remarkable accuracy, despite encryption measures. This attack, also known as a token inference attack, uses large language models (LLMs) to refine raw data obtained from the side channel, effectively decrypting AI assistant responses.

“Currently, anybody can read private chats sent from ChatGPT and other services,” Yisroel Mirsky told Ars Technica.  “This includes malicious actors on the same Wi-Fi or LAN as a client (e.g., same coffee shop), or even a malicious actor on the Internet—anyone who can observe the traffic. The passive attack can happen without OpenAI or the client’s knowledge. OpenAI encrypts their traffic to prevent these kinds of eavesdropping attacks, but our research shows that the way OpenAI uses encryption is flawed. Thus the content of the messages are exposed.”

The attack explained.

Mirsky likened the process to solving a complex puzzle, where LLMs excel at identifying patterns and inferring contextual information from token sequences. The researchers trained LLMs using online example chats, enabling them to decipher responses accurately and even deduce prompts based on known plaintext attacks.

“It’s like trying to solve a puzzle on Wheel of Fortune, but instead of a short phrase, it’s a whole paragraph of phrases, and none of the characters have been revealed,” explained Mirsky.

While the attack’s accuracy may initially seem limited, with 29% perfect word accuracy and 55% high accuracy, the researchers emphasise its real-world impact.

“In a real-time communication setting, AI services transmit the next token ri immediately after it is generated. Our observations of several AI assistant services indicate that the token ri is sent either as an individual message or as part of a cumulative message (e.g., [r1,r2,…,ri]). Crucially, in both scenarios, the packet’s payload length directly correlates to the number of characters in ri. In the case of cumulative messages, the length of each token can be inferred by calculating the difference in payload length between successive packets. Consequently, for each response message, it is possible to discern the lengths of every single token, even when the traffic is encrypted,” explained the researchers.

By employing a cosine similarity approach, the attack can breach confidentiality even when exact words are not deciphered, highlighting the severity of the vulnerability. However, this attack can only protect against common questions that people may ask on the platform. It doesn’t work as planned for arbitrary queries.

Another major finding the researcher made was that one attack model that works perfectly well on one type of AI assistant, ChatGPT, can also work across the board. This means that once the tool has been created, attackers can use it to target all generative AIs.

Almost all LLM models are vulnerable to this attack.

“In the paper, we show how one model (trained, e.g., on ChatGPT) works on other services as well,” wrote the researchers. “This means that once a tool has been built, it could be shared (like other hacking tools) and used across the board with no additional effort.”

The third finding reveals that the attack works relatively easily if the attacker and the victim are on the same network. However, if the attacker and the victim are on different networks, it is much harder for the attack to happen unless the attacker is a nation-state or has intricate knowledge of the ISP.

In response to these findings, some AI assistant providers, such as OpenAI and Cloudflare, have implemented padding mitigations to protect against attacks. However, these measures may compromise user experience, introducing delays and increased network traffic.

This research underscores the importance of robust security measures in chat-based LLms as they proliferate. As AI assistants handle increasingly sensitive conversations, safeguarding user privacy and data integrity remains a critical challenge for the cybersecurity community.

In the News: Amazon introduces generative AI for seller listing creation

Kumar Hemant

Deputy Editor at Candid.Technology. Hemant writes at the intersection of tech and culture and has a keen interest in science, social issues and international relations. You can contact him here:

Exit mobile version