After the 2016 U.S. election, the term “fake news” became common vernacular, and the issue has permeated everyday life thanks mainly to social media use.
In 2024, 54% of U.S. adults said they got at least some of their news from social media, according to a Pew Research Center study. Neighbors often turn to Facebook groups or apps like Nextdoor to keep up with what is happening. A single post about schools, traffic or public safety can spread quickly across the city.
However, without a professional journalistic check for truth in social content, misinformation can spread easily, and it’s challenging to stop it. This is especially true now with deepfakes — images, audio and video created or edited by artificial intelligence — that mimic humans but depict them posting, saying or doing something that isn’t real.
Deepfakes are all around us. A report from Frontiers in Artificial Intelligence estimated that around half a million deepfake videos were shared on social media in 2023, and that number will climb to 8 million this year. Research found that Twitter (now X) posts that contain false information spread on social media up to 10 times faster than actual news, in part because fake posts were written to get readers to react first and check later.
But what happens when users take it upon themselves to discern what is real? Most people use Google or another search engine — which offer AI-generated results — to do so.
While the intention to verify news is commendable, we should be wary of using AI-generated content to check if something is true. Not all search engines that provide AI-generated answers perform the same or provide reliable results.
I know this firsthand; I researched how well the truthfulness of news items was detected among OpenAI’s ChatGPT 3.5 and 4.0; Google’s Bard/LaMDA (now Gemini); and Microsoft’s Bing AI. One hundred news items, all sourced from independent fact-checking agencies, were presented to each of these large language models under controlled conditions. The effectiveness was gauged based on how accurate the large language models were against the reports provided by the fact-checkers.
This yielded two findings: Accuracy, or the correct number of detections among the presented items, and the time it takes for the engines to finish their assessments, typically measured in seconds. The results showed only moderate proficiency in detecting deceptive information correctly across all models, with an average score of 65.25 out of 100 — a D.
OpenAI’s GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs’ abilities to differentiate fact from deception. However, when juxtaposed with the performance of human fact-checkers, the AI models lagged in comprehending the subtleties and context inherent in news and information.
One of the striking distinctions among the models is the training data AI uses to arrive at a response. There is no common training design among search engines, and they can only use open-source information. They don’t have access to industry research from credible firms, nor from medical journals, economic or industry analysts that restrict access or keep their research behind a paywall. As such, fact-checking by search engine can be misleading.
Established fact-checking agencies such as PolitiFact and Snopes delve deeper into the context and nuances of claims, corroborating information from multiple sources. Their processes typically involve expert knowledge, meticulous research and the human ability to discern subtleties that AI struggles with. Human-led fact-checking agencies continue to provide a more reliable check than AI.
The juxtaposition of AI capabilities and human expertise highlights the irreplaceable value of human cognition, judgment and emotional intelligence. In this synergy, we can foster a robust defense against the relentless onslaught of misinformation, ensuring a future where truth triumphs over deception. The growth of AI, therefore, should not be perceived as a journey toward human redundancy, but rather as an opportunity for collaboration.
