Google’s indexing of AI-generated books has stirred controversy within the research community. It has raised questions about the reliability and authenticity of content available through platforms like Google Books and affected tools like Google Ngram.

Emanuel Maiberg, writing for 404 Media, discovered that Google Books’ index contains a significant number of AI-generated books. He identified these low-quality books using methods similar to those previously employed to detect such content on various platforms.

By searching for specific phrases associated with AI-generated answers, such as ‘As of my last knowledge update,’ he discovered numerous books containing such phrases.

While some books legitimately discuss AI-related topics, a substantial portion appears to be AI-generated and unrelated to their purported subject matter.

One example cited is Tristin McIver’s ‘Bears, Bulls, and Wolves: Stock Trading for the Twenty-Year-Old’. This book, priced at Rs. 1,579 in India, presents itself as a comprehensive guide to stock trading but lacks in-depth analysis and often provides surface-level information reminiscent of AI-generated text.

Here’s an excerpt from the book, as noted by 404 Media: “Despite the initial hiccups, Facebook’s stock eventually found its footing in the market. Over the years following the IPO, the company’s share price experienced fluctuations but also demonstrated resilience, reflecting the dynamic nature of the tech industry. As of my last knowledge update in January 2022, Facebook had evolved into Meta Platforms, Inc., reflecting its expansion beyond social media into virtual reality and the metaverse.”

If you look closely at the last line, you will see that this paragraph has been copied directly from a generative AI tool, probably ChatGPT.

We also tried to search for other generative terms, such as ‘Please note that as an AI language model…’ and we were not surprised by the result. Here’s a screenshot of a book showcasing this exact phrase:

This is an image of ai generated books googlebooks candid ss1 — *Coherent State Limit of Cavity Vibrations In Analog To Quantum Acoustics by Professor Ibrahim Elnoshokaty, page 360*

Another problem arising out of such blatant copy-pasting of AI-generated text is the relevance of the book or article. For example, Shu Chen Hou’s book ‘Maximising Your Twitter Presence: 101 Strategies for Marketing Success’ refers to outdated information about Twitter’s verification process, raising questions about the accuracy and reliability of such publications.

The third problem many researchers have complained about is the potential impact of AI-generated content on tools like the Google Ngram viewer, which researchers use to analyse language trends in published books. As Google Books continues to index low-quality content, the tool’s reliability for studying cultural shifts and linguistic evolution can suffer in the long run.

Although generative AI tools are helpful in some fields, they are wreaking havoc in the publishing workspace. Last year, content farm websites were reported to generate thousands of low-quality AI articles, including plagiarised AI versions of news articles, affecting search engines’ algorithms and leading to several changes that have incrementally worsened the situation for independent publishers, including us.

Furthermore, academic publishing, a highly technical and respected field, is also facing the brunt of generative AI. Undisclosed AI content in academic publishing not only lowers the value of research but causes identity and intellectual property rights issues.

This led to Google changing the algorithm in March this year to remove low-quality AI-generated information from the search database. However, Google must develop specific policies for AI-generated books.

In the News: New Latrodectus malware steps in for IcedID in cyber intrusions