A major security breach exposed 1681 valid tokens on HuggingFace and GitHub, posing grave risks to 723 organisations, including Meta, VMWare, Microsoft, and Google.

HuggingFace is a resource for developers working on Large Language Models (LLM) projects. The platform offers the Transformers library, an open-source tool for LLM projects.

Cybersecurity researchers from Lasso Security delved into HuggingFace and GitHub repositories and found many exposed tokens that can lead to supply chain attacks. The breach has granted unauthorised access to repositories such as Meta-Llama, Bloom, and Pythia, raising concerns about the potential misuse of proprietary information, economic losses, and compromised competitive advantage.

Among the exposed tokens, researchers found that 655 tokens had write permissions, 77 of them from multi-billion dollar organisations. With these tokens, researchers gained complete control over the repositories of these organisations.

The exposed tokens revealed vulnerabilities in the LLM applications lifecycle, training data, and model theft. The impact of this exposure is immense, potentially leading to economic losses, compromised competitive advantage, and even the spread of malicious models.

This is an image of exposedhuggingfacetokens ss1 — *Exposed tokens of several high-value organisations. | Source: Lasso Security*

Researchers were able to obtain the following information:

Token validity.
Owner of the token/HuggingFace user.
Email of the user.
Organisations for which the user is working.
Permissions and privileges of the token.
Some specific token-related information.

Beyond manipulating existing models, the research granted the writer access to datasets, raising concerns about Training Data Poisoning. This malicious technique could compromise the integrity of machine learning models, posing a threat to the reliability of widely used datasets.

The research team ‘stole’ over ten thousand private models associated with more than 2500 datasets, revealing a potential gap in the discussion around OWASP’s Model Theft vulnerability.

Researchers also found that the read functionality of the once deprecated org_api HuggingFace tokens can still be activated by tweaking small changes in the login function of the library. Malicious actors could use these tokens to download private models of org_api tokens that have been previously exposed.

“To fellow developers, we advise you to avoid working with hard-coded tokens and follow best practices. Doing so will help you avoid verifying every commit that no tokens or sensitive information is pushed to the repositories,” recommended Bar Lanyado, a security researcher at Lasso Security.

In the News: WhatsApp allows iOS users to send original quality media