AI is gobbling up the internet

Training data is dwindling and developers are trying some pretty experimental methods

Most of us have realised AI is not the single answer to our content problems. It’s a great starting point or supplementary tool. But its quirks render it unreliable – especially as it’s supposed to work from idea to finished product.

Today there’s another hazard: the threat AI poses to itself. We know the internet is colossal. Most of us have no idea of its outer limits. AI, however, has found them and is craving more.

It’s like the beloved Nintendo character Kirby. This rounded, pink creature's stomach functions a bit like the Tardis. No matter how much data is put into it, it wants more. Kirby has gobbled up everything and still wants more.

Artificial intelligence consumes massive amounts of images, video and text. It also has the ability to churn out synthetic content increasingly quickly. And as AI builders run out of human-made data, they are willingly (and sometimes unwittingly) using bot-made data. AI is eating itself.

What if DALL-E et al began supplementing their output with content from rivals such as Midjourney or Stable Diffusion? An AI image or two in a million selfies would be harmless – but if a third or half of these were fake, things might begin to get shaky. Some would call this a “doom loop”, but it’s more accurately described as an autophagous or self-consuming circle.

So AI’s gene pool is shrinking and jokes were bound to follow. Comparisons to the Habsburg royal dynasty have been drawn by data researcher Jathan Sadowski. Look at the family tree for the Habsburgs, once a mighty power in Europe, and you will see a range of relationships between first and second cousins.

The outcome was premature deaths, madness, and peculiar jaws, lips and noses. Sadowski claims AI is quickly becoming “a system that is so heavily trained on the outputs of other generative AIs that it becomes an inbred mutant, likely with exaggerated, grotesque features”. We can see where he’s coming from.

Depending on the generative model, a delicate ratio of real to synthetic inputs must be maintained before destructive side-effects ensue. The trouble is: most companies aren’t sure what that ratio is.

Highbrook could foresee an internet polluted by sub-par AI content being irritating for humans. But we didn’t imagine such material would derail the models themselves.

Novel solutions are being devised. It’s been suggested that OpenAI is seeking to transcribe audio from YouTube videos. Questions have been raised (and sidestepped) about whether it’s already using YouTube in training its forthcoming AI model Sora.

Good writers are selective and take pride in sourcing information. If there are doubts about accuracy, they’ll often contact sources directly. Similarly, photographers travel miles to seek the real thing and understand the story. Where AI retreats into the familiar, humans relish a challenge. Let’s keep it human.

AI is gobbling up the internet

Training data is dwindling and developers are trying some pretty experimental methods

Get our newsletter for insights into modern comms

You might also like

Read More of our Thoughts

A spelling list for quiddles

English as she was pronounced

Covid brand winners and losers