Photo: Tada Images / Shutterstock.com

Even after many websites, including large ones like Reuters and Condé Nast, blocked Anthropic’s scraping bots, the AI company is now scraping their content using the new Claudebot.

Websites have already blocked two AI scraper bots, ‘Anthropic-AI’ and ‘Claude-Web,’ which are Anthropic’s old bots and have been replaced by a new one, Claudebot — resulting in outdated robots.txt configurations that are in place to stop these bots from crawling.

This issue emerged partly due to the habit of copy-pasting robots.txt lists without verifying their accuracy. Also, there is considerable misinformation and confusion in the community. For instance, researchers found out that the websites under News Corp are blocking a bot named ‘Perplexity-ai,’ which likely doesn’t exist.

Experts agree that while the current landscape is confusing, webmasters should adopt a more aggressive blocking strategy. Blocking a non-existent bot causes no harm, while failing to block a real one can have significant repercussions.

The consequences of ineffective blocking are severe. Websites like iFixit and Read the Docs have reported massive server hits from AI crawlers, resulting in substantial financial burdens, reports 404Media.

iFixit experienced nearly a million hits from Anthropic’s crawlers in a single day, and Read the Docs incurred over $5,000 in bandwidth charges due to a single bot accessing 10 TB of data in one day.

This is an image of anthropicai featured ss1 — *Anthropic’s new bot isn’t respecting the wishes of website owners and scraping their data.*

While Anthropic agreed that Anthropic-AI and Claude-Web were old crawlers, the company did not answer whether its new crawler, Claudebot, respects websites that don’t want the crawlers to scrape their data.

“These inconsistencies and omissions across AI agents suggest that a significant burden is placed on the domain creator to understand evolving agent specifications across (a growing number of) developers,” experts noted.

Such incidents underline the urgent need for AI companies to respect web crawling scraping norms and for website owners to enhance their defensive measures. Moreover, the onus of blocking the ever-growing AI scrapers lies on the website owners. New crawlers are popping up, and AI companies are not following business ethics and respecting website owners.

“Blocking agents from AI companies relies on the companies respecting robots.txt files and knowing about all of the AI scraping agents out there. The combined likelihood of this happening is pretty low for most organisations,” Walter Haydock, CEO of StackAware told 404Media.

Despite attempts to block AI scrapers, AI companies just don’t seem to give up.

In the News: Over 200 pro-Nazi accounts are exploiting TikTok to spread hate