Skip to content

Amazon investigates Perplexity over alleged violation of AWS web scraping rules

  • by
  • 2 min read

Perplexity is sending waves across the AI industry, not only because it’s taking on search giant Google or because it just raised $10 to $20 million in funding from SoftBank. The AI startup is just looking to get away with as many unethical practices as possible.

Amazon Web Services (AWS) has launched an investigation to determine whether Perplexity is violating its web scraping rules. There have been allegations on Perplexity claiming that the service uses web crawlers hosted on AWS that don’t honour websites’ Robots Exclusion Protocol. The Robots Exclusion Protocol is a simple text file called robot.txt that developers put on their servers to tell web crawlers whether or not they can access a particular page.

With Perplexity doing everything it can to build an “answer engine” that can compete with Google Search, the protocol stands in the way, so Perplexity decided to ignore it altogether. Or rather, used third-party scrapers that ignore it, according to CEO Aravind Srinivas as reported by FastCompany. Srinivas also declined to name the crawler, or whether he’ll ask it to start honoring the protocol.

Photo: Sundry Photography / Shutterstock.com
Amazon’s chatbot Metis is in the works | Photo: Sundry Photography / Shutterstock.com

Wired had reported earlier that it found a virtual machine running on an AWS server “certainly operated by Perplexity” that was bypassing its website’s robot.txt instructions, confirming findings from Robb Knight that Perplexity has been ignoring the robot.txt code that asks crawlers not to access a page. The same server was also found scraping other notable publications like The Guardian, Forbes, and The New York Times.

Robots.txt isn’t a concrete law but rather a good-faith agreement that has stood the test since at least the 1990s. Perplexity isn’t the only AI company doing this, either. Other “answer engines” trying to beat Google Search, like Arc Search and even Google itself, are also actively scraping the internet, giving users answers to their questions while robbing pages of user visits and, perhaps more importantly, ad revenue.

In the News: Centre for Investigative Reporting sues Microsoft and OpenAI for copyright violation

Yadullah Abidi

Yadullah Abidi

Yadullah is a Computer Science graduate who writes/edits/shoots/codes all things cybersecurity, gaming, and tech hardware. When he's not, he streams himself racing virtual cars. He's been writing and reporting on tech and cybersecurity with websites like Candid.Technology and MakeUseOf since 2018. You can contact him here: yadullahabidi@pm.me.

>