Hemmi
Academic Integrity

AI Detection Tools Explained: How Turnitin & GPTZero Work

AI detection tools are everywhere in academia. But how accurate are they really? Learn how Turnitin, GPTZero, and others detect AI writing.

Hemmi Team9 min read

AI Detection Tools Explained: How Turnitin & GPTZero Work

The rise of large language models like ChatGPT, Claude, and Gemini has fundamentally changed the landscape of academic writing. In response, universities and institutions worldwide have turned to ai detection tools to identify text that may have been generated by artificial intelligence. Tools like Turnitin's AI writing detector and GPTZero are now integrated into submission workflows at thousands of institutions.

But how do these tools actually work? Are they reliable? And what do students need to understand about the technology that is being used to evaluate their writing?

In this guide, we break down the science behind ai writing detection, compare the major platforms on the market, examine their accuracy and limitations, and explain what all of this means for students who want to write with integrity.

How AI Detection Tools Work

To understand how ai detectors work, you first need to understand what makes AI-generated text different from human-written text. AI detection tools rely on a set of statistical and linguistic signals to make their predictions. The three most important concepts are perplexity, burstiness, and statistical pattern analysis.

Perplexity

Perplexity is a measurement of how "surprising" or unpredictable a piece of text is. In technical terms, it quantifies how well a language model can predict the next word in a sequence.

Human writing tends to have higher perplexity. We make unexpected word choices, use idiomatic expressions, switch registers mid-paragraph, and occasionally produce sentences that are grammatically unusual but perfectly meaningful. AI-generated text, by contrast, tends to have lower perplexity because language models are trained to predict the most statistically likely next token. The result is text that flows smoothly and predictably, almost too perfectly.

AI detection tools feed submitted text through their own language models and calculate a perplexity score. If the perplexity is consistently low across the document, the tool flags it as potentially AI-generated.

Burstiness

Burstiness refers to the variation in sentence structure and length throughout a piece of writing. Human writers are naturally "bursty." We write a long, complex sentence, then follow it with a short, punchy one. We start paragraphs with questions, switch to declarative statements, and occasionally throw in a fragment for emphasis.

AI-generated text tends to be far more uniform. Sentence lengths cluster around a narrow range. Paragraph structures follow predictable patterns. Transitions between ideas are smooth to the point of being formulaic. This lack of burstiness is one of the strongest signals that ai detection tools look for.

Statistical Pattern Analysis

Beyond perplexity and burstiness, modern detectors analyze a range of statistical features in the text:

  • Token probability distributions: AI models tend to select high-probability tokens, creating text where word choices are statistically "safe." Detectors measure how closely the token distribution in a submission matches what a language model would produce.
  • N-gram frequency analysis: Certain sequences of words (bigrams, trigrams, and beyond) appear at predictable frequencies in AI-generated text. Detectors compare these frequencies against known baselines.
  • Stylometric features: These include vocabulary diversity (type-token ratio), average word length, use of passive voice, and the distribution of function words like "the," "of," and "however." AI text tends to exhibit specific stylometric fingerprints that differ from natural human writing.
  • Entropy analysis: Related to perplexity, entropy measures the overall information density of a text. AI-generated content often has a characteristic entropy profile that detectors can identify.

Some newer ai detection tools also use classifier models, which are neural networks trained specifically on large datasets of human-written and AI-generated text. These classifiers learn to distinguish between the two categories based on hundreds of subtle features simultaneously.

Major AI Detection Tools Compared

The market for ai writing detection has grown rapidly. Here is an overview of the most widely used platforms and how they differ.

Turnitin AI Detection

Turnitin ai detection is arguably the most consequential tool in this space because of Turnitin's existing dominance in academic plagiarism checking. Turnitin's AI detection feature was integrated into its Similarity Report in 2023 and is now available to institutions that subscribe to the platform.

How it works: Turnitin uses a proprietary AI detection model that analyzes text at the sentence level. Each sentence receives a score indicating the likelihood that it was AI-generated. The tool then produces an overall AI writing percentage for the entire document. Turnitin claims its model was trained on a massive corpus of both academic writing and AI-generated text.

Key features:

  • Sentence-level highlighting showing which parts are flagged
  • Overall AI writing percentage score
  • Integration with existing Turnitin submission workflows
  • Designed specifically for academic contexts

Reported accuracy: Turnitin claims a false positive rate of less than 1% when the AI detection score exceeds a certain threshold. However, independent evaluations have produced more varied results, particularly with non-native English speakers and heavily edited text.

GPTZero

GPTZero was one of the first dedicated AI detection tools, created by Edward Tian, a Princeton University student, in early 2023. It quickly became one of the most popular free tools for checking whether text was AI-generated.

How it works: GPTZero analyzes text using both perplexity and burstiness scores. It provides a document-level prediction as well as sentence-level highlighting. The tool categorizes text as "human," "mixed," or "AI-generated" and provides probability scores for each classification.

Key features:

  • Free tier available for individual checks
  • Batch file scanning for educators
  • API access for institutional integration
  • Supports multiple languages
  • Provides detailed perplexity and burstiness breakdowns

Reported accuracy: GPTZero reports detection rates above 85% for fully AI-generated text, but accuracy drops significantly for text that has been paraphrased, edited, or generated with specific prompting techniques.

Originality.ai

Originality.ai positions itself as a premium tool for content creators and publishers, though it has gained traction in academic settings as well.

How it works: Originality.ai uses an ensemble of AI classification models trained on text from GPT-3, GPT-3.5, GPT-4, Claude, Gemini, and other language models. It provides a percentage score indicating the likelihood that text is AI-generated.

Key features:

  • Detects content from multiple AI models
  • Combined plagiarism and AI detection scanning
  • Team management features for organizations
  • Chrome extension for on-the-fly checking
  • Regular model updates as new AI systems are released

Reported accuracy: Originality.ai claims high detection rates, particularly for unedited AI text. However, like all detectors, its performance degrades with heavily edited or mixed-origin content.

Copyleaks

Copyleaks offers AI content detection as part of its broader plagiarism detection platform. It is used by educational institutions, enterprises, and publishers.

How it works: Copyleaks uses a multi-layered detection approach that combines statistical analysis with machine learning classifiers. It provides both document-level and sentence-level results.

Key features:

  • Supports over 30 languages
  • LMS integrations (Canvas, Blackboard, Moodle)
  • API access for custom workflows
  • Detects AI-generated code in addition to prose
  • Source code AI detection for computer science programs

Reported accuracy: Copyleaks claims a 99.1% accuracy rate with a 0.2% false positive rate, though these figures come from the company's own testing and have not been widely replicated by independent researchers.

How Accurate Are AI Detectors?

This is the central question, and the honest answer is: it depends.

Several independent studies and investigations have tested the accuracy of ai detection tools under real-world conditions:

  • A 2023 study published in Nature found that AI detectors had an average accuracy of around 70-80% for fully AI-generated text but dropped to near-random performance (around 50%) when the AI text had been lightly paraphrased.
  • Research from Stanford University found that AI detectors disproportionately flagged writing by non-native English speakers as AI-generated, raising serious equity concerns.
  • A 2024 study from the University of Reading found that AI-generated exam responses submitted through Turnitin went undetected in a significant number of cases, with detection rates varying by subject area.
  • Testing by journalists and academics has repeatedly shown that simple techniques like asking the AI to "write in a more casual tone" or manually editing 20-30% of the output can dramatically reduce detection rates.

The fundamental challenge is an arms race. As detectors improve, language models also improve, producing text that is increasingly indistinguishable from human writing. Each new generation of AI models produces text with higher perplexity and more natural burstiness, directly undermining the signals that detectors rely on.

For a deeper look at institutional detection capabilities, see our guide on whether universities can detect AI writing.

Known Limitations and False Positives

AI detection tools have several well-documented limitations that students, educators, and administrators should understand:

False Positives with Non-Native English Speakers

This is perhaps the most troubling limitation. Non-native English speakers often write with lower perplexity because they rely on common vocabulary and familiar sentence structures. This writing pattern closely resembles AI-generated text in the eyes of detection algorithms. Multiple studies have confirmed that ESL students are flagged at disproportionately higher rates, creating a serious fairness problem.

Heavily Formulaic Writing

Academic writing in certain disciplines, such as legal analysis, scientific methods sections, and standardized reports, follows rigid conventions. This formulaic quality can trigger AI detectors because the writing exhibits low burstiness and predictable token distributions, even when it is entirely human-written.

Edited and Mixed Content

When students use AI to generate a draft and then substantially edit it, or when they use AI for specific paragraphs and write the rest themselves, detection tools struggle to provide accurate assessments. The mixed-origin nature of the text confuses classifiers that are trained on purely human or purely AI examples.

Paraphrased AI Content

Simple paraphrasing, either manual or through dedicated paraphrasing tools, can significantly reduce detection rates. Changing word order, substituting synonyms, and restructuring sentences are often enough to bring an AI detection score below the flagging threshold.

No Ground Truth

Unlike plagiarism detection, where a detector can point to a specific source that was copied, AI detection provides only a probability estimate. There is no definitive way to prove that a piece of text was or was not generated by AI. This probabilistic nature means that AI detection results should never be used as the sole basis for academic misconduct charges.

Model-Specific Blind Spots

Most detectors are trained primarily on text from popular models like GPT-3.5 and GPT-4. They may perform poorly on text generated by less common models, open-source models, or future models that they were not trained on.

What This Means for Students

Understanding how ai detection tools work is not about learning to evade them. It is about making informed decisions about how you use AI in your academic work.

Use AI as a Research and Learning Tool, Not a Ghostwriter

The safest and most ethical approach is to use AI tools to support your research and writing process rather than to generate finished text. Use AI to brainstorm ideas, find research directions, understand complex concepts, and get feedback on your drafts. Then write your submissions in your own words.

Tools like Hemmi are designed with this philosophy in mind. Rather than generating generic AI text that detectors can flag, Hemmi helps you produce authentic, human-quality writing by supporting your research process with properly sourced references and structured guidance. The result is work that is genuinely yours, informed by AI assistance but written in your own voice.

Understand Your Institution's Policies

AI use policies vary dramatically between institutions and even between courses. Some professors encourage AI use with proper disclosure. Others prohibit it entirely. Before using any AI tool in your academic work, make sure you understand the specific rules that apply to your assignment.

Maintain Your Authentic Voice

One of the best ways to ensure your writing is recognized as human-written is simply to write like yourself. Develop your own style, use vocabulary that feels natural to you, and do not try to sound like a textbook. Authentic human writing has quirks, personality, and specificity that AI detectors are designed to recognize as genuinely human.

For practical strategies on using AI responsibly, see our guide on how to use AI ethically in academic writing. You may also want to review our article on AI plagiarism and how to avoid it.

Keep Records of Your Writing Process

If your institution uses AI detectors, consider keeping records that demonstrate your writing process. Save drafts, outlines, notes, and revision histories. If you are ever questioned about the authenticity of your work, these records can serve as evidence of your genuine effort.

Key Takeaways

  • AI detection tools work primarily by analyzing perplexity, burstiness, and statistical patterns in text to determine how closely it resembles AI-generated output.
  • Turnitin ai detection and GPTZero are the two most widely used detectors in academic settings, but several other tools like Originality.ai and Copyleaks also compete in this space.
  • Accuracy varies significantly depending on the context. Detectors perform best on fully unedited AI text and worst on paraphrased, edited, or mixed-origin content.
  • False positive rates are a serious concern, particularly for non-native English speakers and writers in highly formulaic disciplines.
  • AI detection results are probabilistic, not definitive. They should be treated as one data point among many, not as proof of misconduct.
  • The most reliable way to avoid detection issues is to use AI responsibly as a support tool and to write your final submissions in your own voice. Hemmi can help you do exactly that by guiding you through research and writing without replacing your original thinking.

Frequently Asked Questions

Can AI detection tools tell which AI model was used?

Most ai detection tools cannot reliably identify which specific AI model generated a piece of text. They can generally determine that text is likely AI-generated, but distinguishing between GPT-4, Claude, Gemini, or other models is beyond the capability of current commercial detectors. Some research tools have shown limited ability to attribute text to specific model families, but this technology is not yet reliable enough for practical use.

Are AI detection tools legally admissible as evidence?

As of now, there is no established legal precedent treating AI detection results as definitive evidence. The probabilistic nature of these tools means they indicate likelihood, not certainty. Most academic integrity experts recommend that AI detection results be used as a starting point for investigation, not as standalone proof. Several universities have updated their policies to reflect this, requiring additional evidence before pursuing academic misconduct charges based on AI detection flags.

Can I check my own writing before submitting it?

Yes, many ai detection tools offer free or affordable options for individual users. GPTZero provides a free tier, and several other tools allow limited free scans. However, be aware that checking your own human-written work might sometimes produce surprising results. If your writing is flagged despite being entirely your own, do not panic. Focus on ensuring your writing demonstrates the kind of personal voice and specificity that reflects genuine understanding of your topic.

Do AI detectors work on languages other than English?

Detection accuracy is highest for English-language text, as most detectors were primarily trained on English datasets. Tools like Copyleaks and GPTZero claim to support multiple languages, but independent testing suggests that detection accuracy in non-English languages is significantly lower. Students writing in less commonly tested languages should be aware that detection results may be less reliable.

Will AI detectors become more accurate over time?

The relationship between AI generators and AI detectors is best understood as an ongoing arms race. As detectors improve, AI models also advance, producing text that is harder to distinguish from human writing. While detection technology will certainly improve, so will the quality of AI-generated text. The long-term trend suggests that purely technical detection will become increasingly difficult, pushing institutions toward more holistic approaches to academic integrity that rely less on automated tools.

Conclusion

AI detection tools represent a significant technological response to the challenges posed by AI writing in academia. Understanding how these tools work, including their reliance on perplexity, burstiness, and statistical analysis, empowers you to make better decisions about how you approach your academic writing.

The key insight is this: AI detectors are imperfect. They are useful as screening tools but unreliable as definitive judges of authorship. Rather than trying to game these systems, the smarter approach is to use AI responsibly and to develop your own writing skills.

If you want to leverage AI in your academic work without compromising your integrity, Hemmi offers a better path. Instead of generating text for you to pass off as your own, Hemmi supports your research and writing process with real sources and structured guidance, helping you produce work that is authentically yours. Give it a try at hemmi.app and experience the difference between AI-generated writing and AI-assisted writing.

ai detectionturnitingptzeroacademic integrityai writing
← Back to all posts