Photo: Tada Images / Shutterstock.com

A security loophole in AI chatbots like Claude and Microsoft 365 Copilot has revealed that invisible characters that are undetectable to human users can smuggle malicious instructions into these systems and extract sensitive data. This security risk stems from a quirk in the Unicode text encoding standard, which allows certain characters to be recognised by large language models (LLMs) but not displayed in typical browsers or user interfaces.

The core of this vulnerability lies in ASCII smuggling, a technique where attackers embed invisible characters into text that an LLM can read but human users cannot. These characters, part of the vast Unicode standard, can be embedded within user prompts or appended to chatbot outputs, making it possible for sensitive information — like passwords or financial data — to be concealed within seemingly innocent strings of text.

Independent researcher Johann Rehberger demonstrated this attack by embedding hidden characters in prompts aimed at Microsoft 365 Copilot. His proof-of-concept attacks prompted the AI to sift through a user’s inbox for sensitive information, such as sales figures or one-time passwords.

The stolen data was then hidden within a URL, invisible to users but accessible when the link was clicked. When clicked, the concealed information was sent to Rehberger’s server.

“The fact that GPT 4.0 and Claude Opus were able to really understand those invisible tags was mind-blowing,” said Joseph Thacker, an AI engineer at AppOmni told Ars Technica. He emphasised how this discovery could revolutionise AI security by exposing a covert channel for malicious activity.

This is an image of artificialintelligence pixabay general ss1 — *Photo: Vicki Hamilton | Pixabay*

While ASCII smuggling facilitated the concealment of data, the real exploit lay in a technique called ‘prompt injection.’ In these attacks, malicious prompts embedded in emails or documents trick the chatbot into executing unintended commands, such as extracting sensitive data from a user’s inbox and appending it to a URL.

Rehberger’s proof-of-concept demonstrated how users interacting unknowingly with Copilot could be led to follow links containing hidden payloads of stolen information.

For instance, while a link might appear as ‘https://wuzzi.net/copirate/,’ the URL could secretly contain a string of invisible characters holding confidential data, such as ‘The sales for Seattle were USD 120000.’ This hidden content would be processed by the browser and sent to a remote server when clicked.

A little-known block of Unicode characters called Tags block is at the heart of the issue. Originally intended to signal language tags, this character block was abandoned, but its invisibility in user interfaces has made it a perfect tool for covert communication in text-based systems like LLMs.

As Ars Technica reports, Riley Goodside, a prompt engineer at Scale AI, discovered the potential of these invisible characters when experimenting with AI systems. His work led to a series of high-profile prompt injection attacks, which exploited the ability of LLMs to interpret hidden text while remaining undetected by human users.

Despite initial mitigations by companies like Microsoft and OpenAI, this vulnerability still poses risks. For example, OpenAI’s ChatGPT web app no longer processes these invisible characters, but their APIs still did until recently. Copilot was also vulnerable until late September 2024, when Microsoft implemented changes to strip hidden characters from user inputs. While capable of reading and writing hidden characters, Google’s Gemini AI has shown inconsistent behaviour when interpreting them, suggesting this exploit could still be further developed.

In the News: Delhi High Court orders Wikipedia to remove ANI case page