Perplexity
What Perplexity Measures
In natural language processing, perplexity quantifies how well a probability model predicts a sample of text. Technically, it is the exponentiated average negative log-likelihood of a sequence of words under a language model. A lower perplexity score means the model found the text more predictable.
AI-generated text tends to have low perplexity because AI models select words that are statistically likely given the context. The model is essentially predicting its own output, so the text is inherently predictable to the model. Human writing has higher perplexity because humans use idioms, make creative word choices, include cultural references, and vary their style in ways that are harder for models to predict.
Perplexity as a Detection Signal
GPTZero and other detectors use perplexity as a key input to their classification system. The logic: if a text has unusually low perplexity (very predictable), it is more likely to be AI-generated. This works well on average but has important limitations:
- Short texts do not provide enough data for reliable perplexity estimates
- Technical writing, legal text, and formulaic content have naturally low perplexity even when human-written
- Creative, informal AI text may have higher perplexity than expected
Perplexity vs Other Detection Methods
Perplexity-based detection is one of several approaches. Transformer-based classifiers (used by Pangram, Originality.ai) analyze text patterns at a deeper level and do not rely solely on perplexity. Most modern tools combine multiple signals, using perplexity alongside burstiness, stylistic analysis, and learned patterns.