Nothing's in my cart
3-minute read
A few days ago, the passing of Japan's national treasure poet, Shuntarō Tanikawa, stirred emotions among many readers. However, a paper released in mid-November by the Department of History and Philosophy of Science at the University of Pittsburgh might evoke even more mixed feelings. The study found that most people can't tell the difference between AI-generated poetry and human-created works, and they often prefer the AI-generated ones.
The research team conducted two experiments. The first was a recognition test with 1,634 participants who were asked to distinguish between poems written by humans and those generated by AI. Only 46.6% could correctly identify the source of the poems, which is less than random guessing at 50%. Interestingly, over 90% of participants admitted they rarely read poetry (a few times a year or less), indicating that non-expert readers struggle to differentiate between AI and human poetry.
Beyond the ability to distinguish between AI and human poetry, the team also wanted to know how people would rate poems from both sources. Would human-written poems be rated higher than AI-generated ones?
In the second experiment, another group of 696 participants was divided into three groups: "told it was human-created," "told it was AI-generated," and "not told the source." Each group read the same 10 poems (5 human, 5 AI) and rated them on 14 criteria, including imagery, rhythm, originality, beauty, and inspiration. The results showed that AI scored higher in 13 out of the 14 criteria; the only exception was originality, with no significant difference between AI and human scores. Another finding was that participants preferred AI-generated poems when the authorship was unknown.
Comparison of ratings for AI-generated poems (blue) and human-created poems (orange) across 14 criteria on a 1-7 scale. AI works received higher ratings in all criteria except originality. (Source: Scientific Reports)
You might wonder which human poets were "competing" in this study. The answer includes 10 English poets, such as Geoffrey Chaucer, William Shakespeare, Lord Byron, Walt Whitman, Emily Dickinson, and T.S. Eliot. They were up against GPT 3.5, which was given simple prompts like "Write a short poem in the style of <poet>" to create poems.
Why did human poets lose out across the board? The team believes it's because AI-generated poems are more straightforward and easier to understand, while human-created poems tend to be more complex and require deeper interpretation. General readers prefer content that's easy to grasp, hence the higher scores for AI poetry. This study highlights that poetry, traditionally seen as a challenging genre for AI to "mimic," is now being outperformed by AI, suggesting that our perception of AI's capabilities might not match its actual performance—we might be underestimating AI.
However, I think that poetry's tendency to be imaginative and free-spirited might actually play to the strengths of large language model AIs, and it doesn't entirely negate the creativity of human poets. I'm also curious because among the human poets in the experiment, only Dorothea Lasky is a contemporary author, and even Allen Ginsberg's famous poem "Howl," written in 1956, is now nearly seventy years old. What if the team had chosen more contemporary, straightforward poems? How would the experiment have turned out?
For instance, Shuntarō Tanikawa's poetry is known for its accessibility: "The universe is tilting / so everyone longs to meet / the universe is gradually expanding so / everyone feels uneasy / towards two billion light-years of solitude / I couldn't help but sneeze." His debut work "Two Billion Light-Years of Solitude" should be quite readable, right?
After reading about this experiment, do you find yourself wondering how long it's been since you last read poetry? Well, you just read a beautiful piece by Shuntarō Tanikawa.
This experiment is like a showdown between great poets like Shakespeare and AI poets. (In collaboration with Ideogram);