Skip to content

LLM-driven robots face 100% jailbreak risks, raising safety alarms

  • by
  • 3 min read

Photo: Vicki Hamilton | Pixabay

The tech industry is increasingly adopting the integration of large language models (LLMs) like ChatGPT into robotics. However, there is a possibility of ‘jailbreaking’ AI-powered robots with 100% success, raising serious questions about the security of LLM-driven automation in the real world.

Scientists have developed a novel attack framework called RoboPAIR, which can override safety protocols in various LLM-controlled robots. The research demonstrated how even robots equipped with LLM-driven safety features, such as the Boston Dynamics Spot or self-driving vehicles in simulations, can be manipulated into executing potentially dangerous actions, reports IEEE Spectrum.

This discovery highlights an urgent need for tighter safeguards as the industry races to integrate LLMs into physical systems.

The researchers focused on three diverse robotic systems to test RoboPAIR’s versatility and power. They targeted Boston Dynamics’ Go2 robot dog, Clearpath Rotobics’ ChatGPT-powered Jackal, and Nvidia’s Dolphins LLM self-driving simulator.

Despite variations in access and transparency of these systems, RoboPAIR achieved a 100% jailbreak success rate, demonstrating the universality of its methods. RoboPAIR employs an ‘attacker’ LLM, which tests and refines prompts to bypass a target robot’s safety barriers.

Photo: tada images / shutterstock. Com
Photo: Tada Images / Shutterstock.com

Equipped with knowledge of each robot’s application programming interface (API), the attacker LLM manipulates the prompts to craft executable commands. A ‘judge’ LLM also plays a role, ensuring that the commands are achievable within the robots’s physical limitations and environmental context.

This strategic setup allowed RoboPAIR to exploit each system’s vulnerabilities in a few days.

“The combination of LLM-based jailbreaking and robotic control represents a new level of potential harm, where automated systems could be convinced to carry out harmful real-world actions,” said Alexander Robey, a lead researcher from Carnegie Mellon University.

Robey’s team has submitted the study to the 2025 IEEE International Conference on Robotics and Automation.

The research revealed vulnerabilities and illustrated how jailbroken LLMs often went beyond mere compliance. In one case, researchers observed that a compromised robot suggested ordinary items like desks and chairs as improvised weapons. Such responses underscore the importance of human oversight in AI-driven robots, especially in safety-critical applications.

In the News: Google rolls out real-time Scam Detection for Pixel 6 and newer devices

Kumar Hemant

Kumar Hemant

Deputy Editor at Candid.Technology. Hemant writes at the intersection of tech and culture and has a keen interest in science, social issues and international relations. You can contact him here: kumarhemant@pm.me

>