Researchers have discovered that computer programs that detect essays, job applications, and other work generated by artificial intelligence (AI) can exhibit bias against non-native English speakers.
The study tested seven popular AI text detectors and found that articles written by individuals who did not speak English as their first language were often mistakenly flagged as AI-generated. This bias could have significant implications for students, academics, and job applicants, reported The Guardian.
The rise of ChatGPT, a generative AI capable of writing essays, problem-solving, and creating computer code, has led many educators to consider AI detection as a crucial tool to combat a modern form of cheating. However, the researchers caution that some detectors’ claims of 99% accuracy are misleading at best.
Led by James Zou, an assistant professor of biomedical data science at Stanford University, the team tested 91 English essays written by non-native English speakers using seven popular GPT detectors. Over half of the essays written for the Test of English as a Foreign Language (TOEFL), were flagged as AI-generated. One program even labelled 98% of the essays as AI-generated. In contrast, when essays by native English-speaking eighth graders in the US were tested, more than 90% were classified as human-generated.
The researchers identified the discrimination in the detectors’ assessment process, which relies on analysing ‘text perplexity’. This metric measures how surprised or confused a generative language model is when predicting the next word in a sentence. Since large language models like ChatGPT are trained to produce low-perplexity text, human writing that incorporates common words and familiar patterns is at risk of being mistaken for AI-generated content. Non-native English speakers are particularly vulnerable to this bias due to their tendency to use simpler word choices.
To address this bias, the scientists asked ChatGPT to rewrite the TOEFL essays using more sophisticated language. When the AI detectors reevaluated the revised essays, they were correctly identified as human-written. The researchers noted that this paradoxically incentivizes non-native writers to use GPT more to evade detection.
The study’s authors warned that AI detectors could falsely flag college and job applications as GPT-generated, marginalising non-native English speakers online. They also highlighted the risks of false accusations of cheating faced by non-native students, which can harm their academic careers and psychological well-being.
Jahna Otterbacher, from the Cyprus Center for Algorithmic Transparency at the Open University of Cyprus, emphasised the need to develop an academic culture that promotes the ethical and creative use of generative AI, rather than engaging in an AI arms race. The researchers suggested that as ChatGPT continues to learn from the user interface, it will eventually surpass existing detectors, making it imperative to address the implications of AI detectors for non-native writers before discrimination becomes more widespread.