Nothing's in my cart
5-minute read
At the end of last year (12/27), The New York Times filed a lawsuit against OpenAI and Microsoft, accusing the two companies of using its published content to train their language model for their AI chatbot. The lawsuit also mentioned the emergence of "AI hallucinations" that could damage its brand reputation — for example, when answering a question, ChatGPT or Bing Chat can sometimes return serious nonsense but then attribute the source of the nonsense to The New York Times.
The New York Times is not alone in its concerns.
The Cambridge Dictionary's Word of the Year for 2023 is "Hallucinate," stemming from the craze for large language models (LLMs) like ChatGPT. The dictionary notes, "When an artificial intelligence (= a computer system that has some of the qualities that the human brain has, such as the ability to produce language in a way that seems human) hallucinates, it produces false information."
Famous examples include: Google's Bard made a mistake during its debut by attributing "the first photo of an exoplanet" to the Webb Space Telescope (the correct answer is the Very Large Telescope of the European Southern Observatory); last May in New York, a lawyer used ChatGPT to draft court documents that contained false case references and, as a result, faced sanctions from the court.
In my own testing, when I asked GPT 3.5 to name a few AI experts on AI hallucinations, four out of five listed in ChatGPT’s response were made up, with only the third expert,Ian Goodfellow, being a real person.
Please GPT 3.5 cite what several artificial intelligence experts have said about AI hallucination, and the result is AI hallucination.GPT 3.5 explains this to illustrate the concept of AI illusion. Is it a kind of negative that makes a positive?
In fact, according to a Vectara survey, all major language models currently have "hallucination" issues. As shown in the table below, GPT 4 has the lowest "hallucination rate" at 3%, while Google Palm 2 Chat is as high as 27.2% (notably, it also gives the longest answers).
Vectara's statistics on the hallucination rates of major language models.
The widespread AI hallucinations remind us of the 95-year-old linguistics legend Noam Chomsky's letter to The New York Times in March last year: The False Promise of ChatGPT.
In the article, Chomsky harshly criticizes large language models, arguing that they betray the essence of language and produce nothing but falsehoods, mediocrity, and evil — even using Hannah Arendt's concept of "the banality of evil" to attack ChatGPT, showing his anger.
Chomsky insists that the value of human language lies in the ability to explain with minimal information, while large language models merely describe and predict text, lacking counterfactual thinking and moral reasoning — counterfactual thinking (imagining and deducing different situations from the facts) expands our thinking based on existing clues, while morality tells us that seemingly infinite thought is still limited by worldly principles.
Chomsky's examples include: "Here’s an example. Suppose you are holding an apple in your hand. Now you let the apple go. You observe the result and say, 'The apple falls.' That is a description. A prediction might have been the statement 'The apple will fall if I open my hand.' Both are valuable, and both can be correct. But an explanation is something more: It includes not only descriptions and predictions but also counterfactual conjectures like 'Any such object would fall,' plus the additional clause 'because of the force of gravity' or 'because of the curvature of space-time' or whatever. That is a causal explanation: 'The apple would not have fallen but for the force of gravity.' That is thinking."
And, "In 2016, for example, Microsoft’s Tay chatbot (a precursor to ChatGPT) flooded the internet with misogynistic and racist content, having been polluted by online trolls who filled it with offensive training data.” In the article, Chomsky disdainfully believes that the language predictions of large language models are always dubious and superficial.
Why would AI spout nonsense?
From another perspective, things might be entirely different, and hallucinations might not be hallucinations at all.
Andrej Karpathy, a founding member of OpenAI, might help us get some clarity on what these so-called hallucinations are with his musings on X (Twitter) at the end of last year, which also carried a bit of anger. He believes that the essence of large language models is dreaming, which is not a problem but the way they operate; it's not a flaw but a feature.
Andrej Karpathy compares the operation of large language models to dreaming. "We direct their dreams with prompts. The prompts start the dream, and based on the LLM's hazy recollection of its training documents, most of the time the result goes someplace useful. It's only when the dreams go into deemed factually incorrect territory that we label it a 'hallucination'. It looks like a bug, but it's just the LLM doing what it always does."
He also contrasts with search engines: search engines do not dream at all, they find information based on input data, without any hallucinations, but also without the ability to generate content. Should we complain that search engines have a 'lack of creativity' problem? (Although Karpathy did not actually pose this rhetorical question, the implications are strong.)
Despite this, Andrej Karpathy's slightly angry short article still gives everyone (who?) an out: he differentiates between "large language model assistants" (like ChatGPT) and "large language models" themselves, and says he does indeed realize that the hallucinations people generally discuss refer to the former. He also suggests several improvements, such as Retrieval-Augmented Generation (RAG); comparing multiple different responses generated by the model to find contradictions or inconsistencies; allowing the model to reflect on its response process and establish verification steps to check the information it generates; and evaluating the correctness of specific outputs based on the neural network activations of the model (such as AI learning patterns).
From Chomsky to Karpathy, we can see that the former thinks from the position of a linguist and raises criticism, while the latter responds and defends from the practical operational level; Chomsky shows us the true essence and spirit of human language and thinking (such as having counterfactual thinking and moral principles), Karpathy helps the general public understand the true nature of large language models — dreams recollection correspond to their way of operation, and our own vague memories correspond to their training data sets.
When we understand that AI hallucinations are actually a normal part of an AI's routine, maybe we should instead ask: if AI is still dreaming, purely generating responses that are sometimes accurate based on its pool of vague data, what will happen when it wakes up? Will it be the moment of convergence? Is that when a 'superintelligence' will reign over humanity? Will that be the birth of a consciousness beyond humanity?
As for AI Hallucinations... no matter the era, the distinction between truth and falsehood still depends on our own ability to discern, even if we are seemingly in control of reality. As the manager of the Cambridge Dictionary, Wendalyn Nichols, said: "The fact that AIs can 'hallucinate' reminds us that humans still need to bring their critical thinking skills to the use of these tools."