Following its $60 million deal allowing search giant Google to use Reddit’s data for training its AI models and showing the platform’s results more prominently on Google Search, Reddit now seems to be blocking search engines like Bing, DuckDuckGo, Mojeek, and Qwant — essentially any search engine that doesn’t use Google indexing won’t be able to access Reddit data.
If you were to run a search using any of the aforementioned search engines, you’d see no results from the past week. Results before that will still be visible, but that’s likely because the respective search engine’s bots have already crawled the visible pages. According to 404 Media, who first found out about Reddit’s newfound blocking, the only other search engine still able to crawl Reddit is Kagi. This independent, paid search engine buys part of its search index from Google.
These changes are also reflected in Reddit’s robot.txt file, which controls what bots can scrape the website. At the moment, this file is very simple and much stricter than it used to be in the sense that it allows no user-agent (bot) to scrape any part of the site, along with the following comment — “Reddit believes in an open internet, but not the misuse of public content.”
Reddit’s stance on sites scraping its content isn’t an overnight change. The site has been upset about corporations scraping its data and doing whatever they want, including training AI data. Reddit data is invaluable for AI training and just regular search as well because it happens to be one of the most active corners of the internet where millions of real humans engage with each other and give advice, write about personal experiences, share thoughts and expressions, and just about everything else.
Although Reddit hasn’t explicitly said it, it’s clear that AI training on the site’s content has been one of the major driving forces behind this policy change. As a direct result, Google and other search engines haven’t been able to show proper results as they previously did. With Google having almost exclusive access to Reddit in a time where Search results are getting progressively worse and adding “site:reddit” is one of the best ways of fixing that, this blocking can get Google into anti-competitive trouble.
In the News: Russian hackers target Indian political observers with multi-stage attacks