Skip to content

GPT-4o is ‘unlikely’ to cause catastrophe; gets ‘medium’ risk score

  • by
  • 2 min read

OpenAI released the GPT-4o System Card, a research document detailing pre-release safety work and risk assessments, on Thursday. Before the model was launched in May 2024, the company conducted safety assessments with the help of external red teamers or cybersecurity experts, attempting to expose weaknesses and key risks in the model.

They examined risks such as unauthorised cloning of voice, violent and erotic content, and/or snippets of reproduced copyrighted audio. According to OpenAI’s Preparedness framework, the overall risk score is categorised as “medium.”

The overall risk was evaluated using the highest score across the four categories: cybersecurity, biological threats, persuasion, and model autonomy. The scores for all categories were measured to be low, except persuasion (marginally crossing to medium).

While GPT-4o samples were not found to be more persuasive, the researchers observed that a few writing samples could perform better than human-written text at influencing users’ opinions.

OpenAI tested and released research on previous models such as GPT-4, GPT-4 with vision, and DALL-E 3. The startup also worked with independent third-party labs, Model Evaluation and Threat Research (METR) and Apollo Research, for additional evaluation of possible critical risks from overall autonomous capabilities.

METR found GPT-4o to be more capable than Claude 3 Sonnet and GPT-4 Turbo in terms of performance. At the same time, many agent failures were noted, 150 of which were reviewed by the lab.

Apollo Research measured the model’s scheming capabilities by testing whether it is capable of modelling itself (self-awareness) and others (theory of mind) in 14 agent and question-answering tasks. Based on the findings, Apollo Research claims that it is unlikely for GPT-4o to be capable of catastrophic scheming.

A combination of methods countered the model’s possible risks. The model was trained to adhere to behaviour to reduce risk through post-training methods and integrated classifiers to block specific generations as part of the system.

GPT-4o has better reading comprehension and reasoning across a specified sample of historically underrepresented languages, closing the gap in performance between English and the languages a little more than before. The five selected African languages were Amharic, Hausa, Northern Sotho, Swahili and Yoruba and the initial test involved translating two language benchmarks and generating short novel-language-specific reading comprehension

OpenAI implemented several safety evaluations and mitigations throughout the development and deployment process and plans to continue monitoring and updating mitigations.

In the News: Ransomware site bugs help six firms dodge ransom payments

Arun Maity

Arun Maity

Arun Maity is a journalist from Kolkata who graduated from the Asian College of Journalism. He has an avid interest in music, videogames and anime. When he's not working, you can find him practicing and recording his drum covers, watching anime or playing games. You can contact him here: arunmaity23@proton.me

>