Two researchers from the University of Cambridge have published a paper detailing a bug dubbed “Trojan Source” that impacts most compilers and software development environments. The issue lies in Unicode — an encoding standard that allows computers to exchange information regardless of the language used.
More specifically, the weakness lies in Unicode’s bi-directional algorithm, also called ‘Bidi.’ The algorithm is responsible for displaying text that includes right to left and left to suitable languages. It has a “Bidi override” that can force left-to-right text to read right-to-left and vice versa.
As this override allows single-script characters to be displayed in a different order than their original encoding, it can potentially be used for hiding malicious file extensions or, as the researchers pointed out, to disguise file extensions of malware distributed through email.
One bug kills all?
Most programming languages let programmers put these overrides in comments and strings. This is especially bad as compilers more often than not ignore what’s written in the comments, including control characters. In addition to that, the problem worsens, especially if you consider that most programming languages allow string literals that might contain arbitrary characters, including but not limited to control characters.
In a nutshell, by putting out Bidi override characters exclusively in comments and strings, an attacker can smuggle them into a program’s source code without the compiler ever knowing a thing. To make matters worse, attackers can also reorder source code characters so that the resulting display order represents syntactically correct source code.
This will further make bug discovery harder, even more so for a human code reviewer as the final code will be perfectly acceptable, as stated by Ross Anderson, co-author of the research paper on the subject. He further adds that “if the change in logic is subtle enough to go undetected in subsequent testing, an adversary could introduce targeted vulnerabilities without being detected.”