In the fascinating and rapidly advancing realm of artificial intelligence, one of the most exciting advances has been the development of AI text generation. AI models, like GPT-3, Bloom, BERT, AlexaTM, and other large language models, can produce remarkably human-like text. This is both exciting and concerning at the same time. Such technological advances allow us to be creative in ways we didn’t before. Still, they also open the door to deception. And the better these models get, the more challenging it will be to distinguish between a human-written text and an AI-generated text.
Since the release of ChatGPT, people all over the globe have been testing the limits of such AI models and using them to both gain knowledge, but also, in the case of some students, to solve homework and exams, which challenges the ethical implications of such technology. Especially as these models have become sophisticated enough to mimic human writing styles and maintain context over multiple passages, they still need to be fixed, even if their errors are minor.
That raises an important question, a question I get asked quite often by my friends and family members (I got asked that question many many times since ChatGPT was released…),
How can we know if a text is human-written or AI-generated?
This question is not new to the research world; detecting AI-generated text, we call this “deep fake text detection.” Today, there are different tools that you can use to detect if a text is human-written or AI-generated, such as GPT-2 by OpenAI. But how do such tools work?
Different approaches are currently used to detect AI-generated text; new techniques are being researched and implemented to detect such text as the models used to generate these texts get more advanced.
This article will explore 5 different statistical approaches that can be used to detect AI-generated text.
Let’s get right to it…
An N-gram is a sequence of N words or tokens from a given text sample. The “N” in N-gram is how many words are in the N-gram. For example:
- New York (2-gram).
- The Three Musketeers (3-gram).
- The group met regularly (4-gram).
Analyzing the frequency of different N-grams in a text makes it possible to determine patterns. For example, among the three N-gram examples we just went through, the first is the most common, and the third is the least common. By tracking the different N-grams, we can decide that they are more or less common in AI-generated text than in human-written text. For instance, an AI might use specific phrases or word combinations more frequently than a human writer. We can find the relation between the frequency of N-grams used by AI vs. humans by training our model on data generated by humans and AI.
If you look up the word perplexed in the English dictionary, it will be defined as surprised or shocked, but, in the context of AI and NLP, in particular, perplexity measures how confidently a language model predicts a text. Estimating the perplexity of a model is done by quantifying how long a model needs to respond to a new text, or in other words, how “surprised” the model is by the new text. For example, an AI-generated text might lower the perplexity of a model; the better the model predicts the text. Perplexity is fast to calculate, which gives it an advantage over other approaches.
In NLP, Slava Katz defines burstiness as the phenomenon where certain words appear in “bursts” within a document or a set of documents. The idea is that when a word is used once in a document, it’s likely to be used again in the same document. AI-generated texts exhibit different patterns of burstiness than that written by a human, as they don’t have the required cognitive processes to choose other synonyms.
Stylometry is the study of linguistic style, and it can be used to identify authors or, in this case, the source of a text (human vs. AI). Everyone uses language. Differently some prefer short sentences, and some prefer long, connected ones. People use semi-colons and em0dashes (And other unique punctuations) differently from one person to another. Moreover, some people use the passive voice more than the active one or use more complex vocabulary. An AI-generated text might exhibit different stylistic features, even writing about the same topic more than once. And since an AI doesn’t have a style, these different styles can be used to detect if an AI writes a text.
Following up on Stylometry, since AI models don’t have their own style, the text they generate sometimes needs more consistency and long-term coherence. For example, AI might contradict itself or change topics and style abruptly in the middle of the text, leading to a more difficult-to-follow flow of ideas.