OpenAI, the creator of the DALL-E and ChatGPT AI bots, has now released a classifier which the company claims can detect whether or not a piece of text was generated using ChatGPT or written by a human. The tool requires at least 1,000 characters (approximately 150-200 words) and can classify whether the text is very unlikely, unlikely, unclear if it is, possibly, or likely AI-generated.
The model behind the tool was trained using pairs of human and AI written text on the same topics. While it can distinguish between the two sources, the tool has some limitations. Most notably, it isn’t always accurate and can mislabel both AI and human-written text. Using the tool requires a free OpenAI account as well.
Additionally, it is also likely to get things wrong on text written by children and on text on originally in English as the model was primarily trained on English content written by adults. It can also sometimes incorrectly label human-written text as AI-generated, especially if the text material differs significantly from the training data.
OpenAI has warned that the “classifier is not fully reliable” as it can also easily be fooled by editing the AI-generated text to get around its limitations. Overall, in their tests, the tool correctly identified 26% of AI-written text as ‘likely’ while incorrectly labelling the human-written text as AI-written 9% of the time on a “challenge set” of English texts.
The classifier’s reliability goes up with the text material’s length, though. While it’s still mostly a work in progress, it’s far more reliable than the company’s previous classifier regarding modern AI systems.
It should be noted that OpenAI’s classifier isn’t the only attempt made at detecting AI-generated text in the wild. Tools like Originality.ai can also detect AI-generated content produced by several different algorithms, including GPT-3, GPT-2, GPT-NEO, GPT-J and ChatGPT.
Limitations of OpenAI’s classifier
Here’s a list of limitations of the classifier, verbatim, as stated by OpenAI.
- The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
- Sometimes human-written text will be incorrectly but confidently labeled as AI-written by our classifier.
- We recommend using the classifier only for English text. It performs significantly worse in other languages and it is unreliable on code.
- Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.
- AI-written text can be edited to evade the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term.
- Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.