Skip to content

Google’s updated privacy policy allows scrapping of public user data

  • by
  • 3 min read

Google’s recent update to its privacy policy has caused concern among users as the company explicitly states its right to scrape and utilise online content for its AI tools.

The updated policy clarifies that Google uses publicly available information to train its AI modes and develop products like Google Translate, Bard, and Cloud AI capabilities.

“Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public,” the new Google policy says. “For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

While privacy policies typically describe how businesses use information posted on their own platforms, Google’s privacy policy appears to extend to data from any sort of public web. This raises new questions about privacy, as users now need to consider not just who can see their information, but how it can be used.

The issue of data sourcing for chatbots is also a concern. Companies like Google and OpenAI have scraped vast amounts of internet data to fuel their AI systems, raising potential copyright and legal issues. The legal landscape will likely grapple with these questions in the coming years.

Twitter and Reddit have taken steps to protect their platforms from data harvesting. Both companies restricted access to their APIs, disrupting third-party tools and causing difficulties for users. Twitter briefly considered charging public entities for the ability to tweet, but faced significant backlash and reversed the decision.

Photo: Tada Images / Shutterstock.com
The generative AI tools scrap data from the web to train. Photo: Tada Images / Shutterstock.com

Web scraping has become a focal point for Elon Musk, who has blamed recent Twitter issues on the need to prevent data extraction. However, IT experts argue that rate limiting and system issues are more likely causes. Reddit also faced protests from moderators who rely on now-inaccessible APIs for their work, leading to potential long-term consequences as some moderates consider stepping away.

It is still too early to say anything about the legality and ethical concerns of data harvesting. However, some signs are already in order. For instance, the EU has ordered Google and other tech giants to label AI content. This is certainly a start and gives a sense of in what direction things are more likely to proceed.

OpenAI has already faced severe backlash and the company, at one time, was faced with an uncertain future in Europe. This led the company to announce that it won’t use users’ data anymore to train its AI tools.

In the News: Meta’s Threads enters market to capitalise on Twitter’s mishaps

Kumar Hemant

Kumar Hemant

Deputy Editor at Candid.Technology. Hemant writes at the intersection of tech and culture and has a keen interest in science, social issues and international relations. You can contact him here: kumarhemant@pm.me

>