Training

December 3, 2024 Last Updated

GPTZero vs ZeroGPT AI Detector Mega Test: Best Accuracy & False Positives?

Radu-Florentin Diaconu

Article at-a-glance

– AI detection is unreliable and can cause publishers to create worse content by relying on tools which are inherently innacurate.

– ZeroGPT is more accurate than GPTZero in assessing AI-generated casual content: 95% accuracy in detecting AI (vs. 84% for GPTZero), and 0% AI detection for human content (vs. 3% for GPTZero).

– For other types of human-written content (fiction, informational blog posts, news reports, political speeches), ZeroGPT is significantly less accurate than GPTZero in correctly identifying human content (ZeroGPT assigns 30% AI likelihood, on average, to human content – versus just over 4% for GPTZero).

– Both testers are unreliable and have substantial risks.

– ZeroGPT has a high chance of false positives (30% average AI probability for human copy, with a 50% rate of false positives).

– GPTZero has a high chance of false negatives (41% AI probability on average for AI copy, with a 35% rate of false negatives)

AI Detector apps and AI content checkers are being increasingly used, from teachers assessing student work (and the students writing said work) to Google itself.

We selected two of the most popular ones – GPTZero and ZeroGPT – and tested them on both AI and human copy to see just how accurate they are.

Want useful AI detection that instantly highlights AI fluff to fix? Check out ABBA (Artificial Bot Babble Auditor). A 100% free tool that checks your content for overused AI phrases and ChatGPT-isms. Quick, simple, way less robotic, and no sign-up needed — try it for free!

Back to our research, the results are occasionally funny – but dispiriting overall.

Until then, though, you’re probably wondering:

AI detection is unreliable and can cause publishers to create worse content by relying on tools which are inherently innacurate.

How’s that different from other similar tests?

For one thing, we’re directly comparing two of the most popular AI checkers against a large-ish database of content.

Most (if not all) AI checker tests were done using the opposite approach – comparing a large number of testers on a limited number of samples.

That’s not ideal, though – too little data tends to yield unreliable results (and we’ll show later on how testing on a single piece of content with a single checker is virtually useless).

Yes, testing the same copy with multiple checkers will show you which of those checkers performs well on that particular copy, but drawing conclusions beyond that isn’t warranted – there’s simply too much variation from test to test.

We tried to fix that, so we went for a larger database (40 pieces of content in total).

Why GPTZero and ZeroGPT?

Simply put, because they’re the most popular dedicated AI checkers based on traffic and search visibility (Quillbot’s AI detector is up there too, but it’s part of the Quillbot AI-assisted writing package – still worth looking at it and we’ll probably do that in a future piece!).

They’re also two of the more accurate based on recent testing, with a recent ZDNet test rating ZeroGPT and GPTZero as 80% and 100% accurate, respectively, with the competition doing considerably worse. Note Originality.ai AI Detector has consistently been shown to be the most accurate AI detector.

Those numbers seem promising, but they don’t really paint the full picture – and the ZDNet test does warn against relying on them too much. Their results aren’t reliable and will vary from test to test.

But how much do they vary? Being right occasionally or even “in general” doesn’t work – we wanted to know exactly how reliable (or unreliable) these popular testers are, so we ran a test on just the two of them, with enough copy to help us draw meaningful conclusions.

This is part 1 of a two-part series where we look at AI checkers and testers. In part 2, we tested AI humanizers, DeepL and SurferSEO.

You can also check the in-depth reviews we did for various AI tools like Shortly AI, Wordtune, Jasper AI, and Outwrite AI.

Here’s what we did:

Generated the most AI-ish casual blog samples out there – asked Chat GPT-4o for 10 popular blog niches, then asked it to come up with short blog samples on topics of its choice; gave it no style prompt no nothing, fingers crossed that the resulting copy will be as painfully AI as possible (this will be the AI control sample, which we know is 100% AI);
Selected
– short stories from the 1840s-1920s,
– stories from the 1990s, and
– political speeches from 1980-2013,
– 10 samples in each category (this will be the human-copy control sample, which we know is 100% human)
Tested GPTZero and ZeroGPT on both samples and compared the scores.

And here’s what we found out…

1. ZeroGPT is overall more accurate in assessing AI in blog posts than GPTZero…

Both GPTZero and ZeroGPT correctly identified our AI blog control group as AI, but ZeroGPT was more accurate, giving a 95% average probability, while GPTZero was only 84% certain on average that the in-your-face AI output we produced was, in fact, AI;
Both testers identified our 100%-human, 0%-AI blog samples as non-AI, with a minor difference once more in favor of ZeroGPT (who was positive that there’s a 0% chance that our unhinged blog-like babble was AI, versus a slightly higher but still respectable 3% probability given by GPTZero)
ZeroGPT had a lower rate of false negatives (AI content incorrectly identified as 30% or less likely to be AI), at just 10% (vs 35% for GPTZero)

2. …but GPTZero is more accurate overall than ZeroGPT, with fewer false positives that incorrectly label human content as AI

The popular tale The Little Match Girl by Hans Christian Andersen was rated as almost 60% likely to be AI-generated by ZeroGPT – one of several howlers we found during our tests.

Once we moved past casual blogging, ZeroGPT showed its limitations, assigning on average a 30% probability that human copy is AI (including some downright funny-if-they-weren’t-sad numbers like 76% AI for Arthur Conan Doyle’s 1891 short story A Scandal in Bohemia or a whopping 93% for George W. Bush’s 2008 State of the Union Address!), with a 50% rate of false positives (content assigned as 20% or more likely to be AI);
GPTZero did much better when testing human non-blog content, with an average AI probability of 4.3% and a false positive rate of 3.3% (only one out of 30 samples tested was rated above 20% likely to be AI – a 1987 speech by Jimmy Carter)

The final rates of false negatives and false positives are as follows:

False Positive:
- GPTZero: 3.3%
- ZeroGPT: 50%
False Negative:
- GPTZero: 35%
- ZeroGPT: 10%

Both ZeroGPT and GPTZero have high rates of false results – and while GPTZero isn’t very likely to rate your human content as AI, there’s a serious chance your AI content will not be accurately identified as such.

What Does This Mean?

For one, ZeroGPT performed significantly better than GPTZero when assessing casual blogs (and that’s important, as it did worse elsewhere): it identified AI copy as AI with higher certainty, it was less likely to be fooled by AI humanizers, and it maintained a flawless and respectable 0% AI when checking legit human copy (GPTZero gave it a slightly more cautious 3% probability of being AI). So for being a free AI detector tool, it did pretty well with casual blog content.

ZeroGPT rated human copy (19th and early 20th century short stories, news reports from the 90s, and political speeches from the late 20th and early 21st century) as likely to be AI.

But when we looked at other types of human writing, ZeroGPT showed its limitations (and remained true to its infamous reputation as making up AI scores for undeniably non-AI copy). It rated clearly human copy (19th and early 20th century short stories, news reports from the 90s, and political speeches from the late 20th and early 21st century) as highly likely to be AI, giving them a 30% AI probability on average, with a 50% rate of false positives.

Let’s hope they are wrong, or our history comes into question and maybe aliens did bring AI to earth hundreds of years ago!

So ultimately neither of these testers are very accurate; and while GPTZero has lower rates of false results overall, it’s still ridiculously risky, with a 35% rate of false negatives – and a non-insignificant chance of showing the odd false positive, too.

Essentially these results mean that if you use any tester on a single text, you’re just as likely to be fed a fake result as a good one; ZeroGPT has similar chances of rating human copy as 0% AI as it does 60% AI, and GPTZero might very well rate every third AI article as “probably human”.

AI testers aren’t really working, especially if you use them on a single text – the risks of a false positive or a false negative results are too high.

If you had to use one though, we’d probably recommend GPTZero simply because it’s less likely to punish human writers; there’s really nothing worse for a writer (or editor!) than having your copy labeled as “likely AI” when you know very well you didn’t touch an AI tool.

Now there may be some merit in looking at why human copy was labeled as likely AI by ZeroGPT – or why AI copy was labeled as likely human by GPTZero – but that’s for another article.

Until then, AmpiFire can help you drive more visibility to your business with quality content development and distribution – get in touch today to see what we can do for you!

These 10 AI writing tools represent the best options available in 2025, each offering unique features to streamline your content creation process and boost productivity https://www.rivalflow.com/blog/ai-writing-tools

Authors

Radu-Florentin Diaconu

View all posts
Chris Munch

CEO and Co-Founder at AmpiFire. Book a call with the team by clicking the link below.

View all posts

SHARE ON:

Reports

The Small Business Digital Marketing Business Model: Predictions For The New Normal

Ultimately crisis or no crisis I use the 7 factors of a winning industry to decide the bigger picture direction of where I want to go.

April 6, 2020 No Comments

Reports

The New Normal & The Intensity & Lingering Effect of Lockdowns

But now quarantines, lockdowns and social distancing have become the new normal. While many are waiting for things to go back to how they are, we can’t be certain that can happen.

April 10, 2020 No Comments

Reports

How to Survive & Prosper Through The Coming Economic Disaster?

There are opportunities for online freelancers, people new to online businesses, any business with a website, and existing online businesses.

June 2, 2020 No Comments

Reports

Which Business Niches Are Growing During The Virus Outbreak?

Firstly let’s take a look at how the virus impacted traffic of various industries…

June 2, 2020 No Comments

Reports

Profiting From Fear During A Crisis: The Hand Sanitizer Business Lesson Gone Wrong

Matt and Noah Colvin. Amazon. Hand sanitizer. You wouldn’t think those three together would make for an interesting story, but they do.

June 2, 2020 No Comments

Reports

Online Business Models For The Pandemic Crisis That Are Continuing To Prosper

Now let’s dive a little deeper into specific business models that are continuing to prosper during the outbreak and why…

June 2, 2020 No Comments

GPTZero vs ZeroGPT AI Detector Mega Test: Best Accuracy & False Positives?

Radu-Florentin Diaconu

How’s that different from other similar tests?

Why GPTZero and ZeroGPT?

This is part 1 of a two-part series where we look at AI checkers and testers. In part 2, we tested AI humanizers, DeepL and SurferSEO.

You can also check the in-depth reviews we did for various AI tools like Shortly AI, Wordtune, Jasper AI, and Outwrite AI.

Here’s what we did:

And here’s what we found out…

1. ZeroGPT is overall more accurate in assessing AI in blog posts than GPTZero…

2. …but GPTZero is more accurate overall than ZeroGPT, with fewer false positives that incorrectly label human content as AI

The final rates of false negatives and false positives are as follows:

What Does This Mean?

Authors

The Small Business Digital Marketing Business Model: Predictions For The New Normal

The New Normal & The Intensity & Lingering Effect of Lockdowns

How to Survive & Prosper Through The Coming Economic Disaster?

Which Business Niches Are Growing During The Virus Outbreak?

Profiting From Fear During A Crisis: The Hand Sanitizer Business Lesson Gone Wrong

Online Business Models For The Pandemic Crisis That Are Continuing To Prosper

Book Your AmpiFire Appointment

Directory

Important Links

Stay Connected

GPTZero vs ZeroGPT AI Detector Mega Test: Best Accuracy & False Positives?

Radu-Florentin Diaconu

How’s that different from other similar tests?

Why GPTZero and ZeroGPT?

This is part 1 of a two-part series where we look at AI checkers and testers. In part 2, we tested AI humanizers, DeepL and SurferSEO.

You can also check the in-depth reviews we did for various AI tools like Shortly AI, Wordtune, Jasper AI, and Outwrite AI.

Here’s what we did:

And here’s what we found out…

1. ZeroGPT is overall more accurate in assessing AI in blog posts than GPTZero…

2. …but GPTZero is more accurate overall than ZeroGPT, with fewer false positives that incorrectly label human content as AI

The final rates of false negatives and false positives are as follows:

What Does This Mean?

Authors

Related Posts

Book Your AmpiFire Appointment

Directory

Important Links

Stay Connected