Skip to content

Nvidia’s new Blackwell AI chips hit overheating snag

  • by
  • 3 min read

The highly anticipated Nvidia Blackwell AI processors, already delayed earlier this year, are now reported to suffer from overheating issues in server setups, sparking concerns across the industry.

According to a report by The Information, overheating arises when Blackwell GPUs are connected in server racks designed to hold up to 72 chips. These overheating issues have led to operational disruptions and design reevaluations.

In response, Nvidia has reportedly asked suppliers to modify rack designs multiple times to address the issue.

The problem has raised concerns among cloud service providers and enterprise customers. The delays and design modifications could jeopardise timelines for critical AI projects and data centre rollouts, particularly for Nvidia’s high-profile clients like Meta, Google, and Microsoft.

These companies are heavily reliant on Nvidia’s technology to power advanced AI applications, such as chatbots, recommendation engines, and more.

First unveiled in March, Nvidia’s Blackwell chips represent a significant leap in AI hardware. Featuring a novel design that fuses two silicon dies into a single component, the chips boast a performance leap — up to 30 times faster than Nvidia’s previous GPU’s for tasks such as generating chatbot responses.

Big tech giants like Meta, Google, and Microsoft rely on Nvidia AI chips for most of their AI assistants.

This breakthrough has made the Blackwell series a critical component in the next wave of AI innovation.

However, the overheating issues are casting a shadow over this technological marvel. Cooling is paramount for a chip designed to handle demanding AI workloads. The problem could force customers to invest in costly infrastructure upgrades or alternative solutions if unresolved.

Nvidia downplayed the issue, framing the engineering adjustments as “normal and expected” in collaboration with leading cloud service providers. In a statement to Reuters, the company emphasised that iterations are part of the development process.

While Nvidia appears confident in its ability to resolve the overheating issues, the disruptions come at a critical time for the company. With the surging demand for AI solutions, Nvidia’s ability to deliver reliable and scalable hardware is under scrutiny.

The overheating issues have caused ripples through the AI ecosystem with Nvidia’s competitors like Intel and AMD stepping up their efforts to capture the lucrative AI chip market. However, the Blackwell chips remain one of the most advanced AI hardware solutions available, and once optimised, they are expected to retain the market share.

In the News: Meta and Alphabet are lobbying to stall Kids Online Safety Act

Kumar Hemant

Kumar Hemant

Deputy Editor at Candid.Technology. Hemant writes at the intersection of tech and culture and has a keen interest in science, social issues and international relations. You can contact him here: kumarhemant@pm.me

>