What happens when genAI vendors kill off their best sources?

opinion

May 21, 20245 mins

Generative AIGoogleTechnology Industry

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

RIP - grave - tombstone - cemetery - death [Image by Rob van der Meijden - CC0 via Pixabay]

If you think the latest generative AI (genAI) tools such as Google AI Overviews and OpenAI GPT-4o will change the world, you’re right. They will. But will they change it for the better? That’s another question.

I’ve been playing with both tools (and other genAI programs, as well). I’ve found they’re still prone to hallucinations, but sound more convincing than ever. That’s not a good thing.

One of the reasons I’m still making a living as a tech journalist is because I’m very good at discerning fact from fantasy. Part of that skill set comes from being an excellent researcher. The large language models (LLM) that underpin genAI chatbots…, not so much. Today, and for the foreseeable future, at their best, genAI is really just very good at copying and pasting from the work of others.

That means the results they spit out are only as good as their sources. Look at it this way: if I want to know about the latest news, I go to The New York Times, the Washington Post, and the Wall Street Journal. Not only do I trust their reporters, but I know what their biases are.

For example, I know I can believe what the Journal has to say about financial news, but I take their columnists with a huge grain of salt. (That’s just me; you might love them.)

As for the Times, remember it claims that OpenAI has stolen its stories to train ChatGPT — and if it wins its case, genAI is in trouble. Because other publishers will follow in quick succession. When that happens, all the genAI engines will have to steal — uhm, learn — their content from the likes of Reddit; your “private” Slack messages; and Stack Overflow, where users are sabotaging their answers to screw up OpenAI.

That’s not going to go well. There’s a reason genAI engines often spew garbage; it’s what they were trained on. For instance, 80% of OpenAI GPT-3 tokens come from Common Crawl. Like the name says, these petabytes of data are scraped from everywhere and anywhere on the web. As a Mozilla Foundation study found, the result is not trustworthy AI.

Worse still, this will eventually lead to a time when those genAI tools start consuming their own garbage. This is a known problem that will cause model collapse. Or, as neuroscientist Erik Hoel pithily describes the end result: “synthetic garbage.” He’s not alone; many AI engineers think a little bit of AI-generated data can poison their LLMs.

At the same time, genAI companies aren’t doing us — or themselves, in the long run — any favors. For example, Google’s AI-powered “Overviews” provides concise AI summaries at the top of search results. This move promises quicker access to information, and Google’s Liz Reid claims it will drive more clicks to websites by piquing users’ interest.

Reid, who oversees search operations, maintains that AI Overviews really will encourage more searches and clicks to websites as users seek to “dig deeper” after getting the initial synthesized summary.

Publishers know better. Who will bother to go to the real story, which might require a subscription or — horrors —seeing an ad?

Danielle Coffee, CEO of the News Media Alliance (it represents more than 2,200 publishers) warns that the change could be “catastrophic” for an industry already struggling with declining ad revenue. “It’s offensive and potentially unlawful for a dominant monopoly like Google to dictate the rules in a way that sacrifices the interests of publishers and creators,” she said.

Google has never been a friend to publishers. Just ask leaders in countries like Spain or Canada, where the government tried to get Google to pay publishers for access to their news sites.

If Google, Microsoft, and other genAI companies keep all those search visitors (and ad revenues) to themselves, as I expect will be the case, publications will die at an even faster rate. And there goes any authoritative information Google and the other AI services need for their LLMs.

OpenAI’s co-founder, Sam Altman, recently said, “GPT-4 is the dumbest model any of you will ever have to use again by a lot” and that “GPT-5 is going to be a lot smarter.”

I’m sure it will be. GPT-4o is clearly superior to its predecessor and GPT-5 will continue the trend. But GPT-6 and beyond? Simple greed may ensure that, as reliable human-created stories disappear, AI will only get dumber and dumber.

In short, we’re looking at a future filled with AI GIGO: Garbage In, Garbage Out. No one wants that. The time to stop it is now.

by Steven Vaughan-Nichols

Follow Steven Vaughan-Nichols on X

Steven J. Vaughan-Nichols has been writing about technology and the business of technology since CP/M-80 was the cutting-edge PC operating system, 300bps was a fast Internet connection, WordStar was the state-of-the-art word processor, and we liked it!

Show me more

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

What happens when genAI vendors kill off their best sources?

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

More from this author

Intel’s CEO, Pat Gelsinger, ‘retires’ — riiiiight

The FTC’s ‘Click-to-Cancel’ rule for subscriptions is long overdue

Leave the Internet Archive alone!

About that brawl between the WordPress co-founder and WP Engine…

Five days a week in the office? Forget it!

Court handcuffs employees with non-compete agreements — again

I’ve got the genAI blues

The workers have spoken: They’re staying home.

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For December’s Patch Tuesday, 74 updates and a zero-day fix for Windows

The Macy’s accounting disaster: CIOs, this could happen to you.

Podcast: Why tech leaders are looking at political power

Podcast: AI disrupts business leaderships, revives others

Podcast: What is the outlook for tech jobs in 2025?

Why Big Tech leaders are seeking political power

AI shakes up leaders, revives others

2025 Tech Job Market: Rainbows or gloom?

What happens when genAI vendors kill off their best sources?

You don’t really think you can depend on answers pulled from the likes of self-appointed Reddit experts, do you?

From our editors straight to your inbox

More from this author

Intel’s CEO, Pat Gelsinger, ‘retires’ — riiiiight

The FTC’s ‘Click-to-Cancel’ rule for subscriptions is long overdue

Leave the Internet Archive alone!

About that brawl between the WordPress co-founder and WP Engine…

Five days a week in the office? Forget it!

Court handcuffs employees with non-compete agreements — again

I’ve got the genAI blues

The workers have spoken: They’re staying home.

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For December’s Patch Tuesday, 74 updates and a zero-day fix for Windows

The Macy’s accounting disaster: CIOs, this could happen to you.

Podcast: Why tech leaders are looking at political power

Podcast: AI disrupts business leaderships, revives others

Podcast: What is the outlook for tech jobs in 2025?

Why Big Tech leaders are seeking political power

AI shakes up leaders, revives others

2025 Tech Job Market: Rainbows or gloom?