Artificial intelligence is everywhere. And so is the content generated from AI. From auto-generated news articles to social media posts to entire websites, AI already produces more online articles than humans, and much of it is low-quality and misleading.
This is a problem. Not just for humans in discerning what is true or not. But also for the machines auto-generating this synthetic information that’s sometimes disconnected from the real world.
As researchers at the University of Washington’s Center for an Informed Public, we study both of these challenges. First, the study the increasingly difficult-to-navigate information environments, and second, the effects of these information environments on AI itself. With the latter challenge, one of our big concerns is similar to what dog owners often experience at least once in their dog ownership: the not-so-pleasant observation of seeing their dog eat their own poop. AI researchers call this autophagy — when these algorithms train on their own output.
The consequences for dogs are not great, but they are potentially worse for large AI systems. Autophagy, through multiple generations, can lead to “model collapse.” One can observe this phenomena visually as images degrade over time, but we can also see this as AI’s responses become less reliable and accurate over time.
However, collapse of models is not our primary concern when it comes to autophagy. Our primary concern is the loss of information diversity and a collapse of knowledge.
Models, which are the core engines of AI systems, learn patterns from the data they are trained on. Modern AI systems, such as Claude, Gemini and ChatGPT, are trained on, essentially, the whole of the internet. Modeling the entire internet is attractive because it is seemingly comprehensive and a shortcut to human knowledge, but, like any modeling approach, it removes detail. Just as a world map is a simplified, compact representation of the world, an AI model is a compressed version of knowledge but also the internet’s internal knowledge. As the internet is both the input and now the output of these large systems, this kind of autophagy becomes a vicious cycle, compressing the internet in its training, then reducing diversity when populating the internet with AI-generated content. Some information and ideas will get lost at each training step. As we adopt AI more widely in our work and personal lives, the impact of reduced information diversity becomes more and more of a problem.
The success of institutions of discovery and understanding, such as science, depend on a wide diversity of methods, theories and approaches. When diversity is lacking, discovery is delayed or even buried. For example, the heretical idea that bacteria caused stomach ulcers was shunned by the research community for a half-century.
Fortunately, this “out there” idea resurfaced. Robin Warren and Barry Marshall received the Nobel Prize in 2005 for discovering that Helicobacter pylori is, indeed, the primary cause of peptic ulcers.
Also in our field of research, diverse perspectives have been proven to be of innumerable value. For example, we would not have such a deep understanding of the social and ethical failures of AI without the pioneering work of many Black women, who have shown that these systems can be as biased and stereotypical as society; or even worse. Excluding non-mainstream ideas in science not only slows progress but also harms the people who hold them.
A model is not necessarily wrong, but it is never the only truth. Consider the many different ways you could draw a map of the world, and which aspects you may prioritize. Thus, as autophagy gradually compresses the internet and our knowledge, it also concentrates control of information prioritization into the hands of private tech companies.
To escape this vicious cycle, we can learn from biology.
Farmers, for example, understand well the role of biodiversity. The Irish Potato Famine of the 1840s led to a million starvation deaths and more than million people displaced. This was due in large part to the monoculture crop at time, the Irish Lumper potato, and an ensuing mold, Phytophthora infestans, which devastated the crop. This harsh lesson about resilience, or lack thereof, taught future farmers the importance of genetic diversity and crop rotation.
We see these near-monocultural lessons playing out in our own state. Although Washington farmers grow a wide variety of crops, they are highly concentrated into a few, like apples, wheat, potatoes and hops. The benefits of this concentration is efficiency, shortcuts and economies of scale, but it also leaves us more vulnerable to especially damaging outbreaks of pests like the codling moth for apples, soil degradation, and price volatility.
Likewise, today’s AI landscape consists of a relatively small number of general-purpose, foundation models that sit underneath well-used applications like ChatGPT and Gemini. These various foundation models are similarly built and similarly trained, even though they are housed in different companies. Taken together, these models are less diverse than the internet itself and far less diverse than the knowledge represented in society. In other words, we may already be experiencing AI monoculture. An alternative would be a more diverse ecosystem: many systems, built with different construction plans and trained on different portions of the internet.
We tested this idea in small, controlled environments and were able to mitigate model collapse. It only took a few cycles for the diverse swarm of models trained on portions of the data to outperform the large single system. While this kind of approach might help us avoid a knowledge collapse, AI companies are unlikely to engage in discussions about the long-term effects of monoculture, given the intense race for eyeballs and the sought-after goal of artificial general intelligence.
Until there exists regulatory awareness, there are important steps we can take as individual consumers.
First, it is important not to get too engrossed with one model or mode of interaction. Diversify your information-gathering tools, from news to social media platforms to conversational agents. Second, AI represents a highly compressed version of the internet, which is itself a skewed sample of human knowledge. If we are to escape an AI monoculture future, we can’t forget this. And, third, be aware of the dynamics researchers call anthropomorphic seduction. Although AI agents do not possess any true human traits like empathy, they can be even more persuasive than humans. This kind of seduction can make us more susceptible to believing and trusting inevitable AI falsehoods.
And, finally, don’t forget the value of a real human expert or a well-vetted source. Recommender systems, and now large AI models, already mediate most of our interactions online, but we can at least diversify our information gathering by prioritizing real human conversations and interactions.
