Skip to content

2,500 pages of Google Search’s internal algorithm docs leaked

  • by
  • 4 min read

In what could be one of the most consequential leaks in recent history, nearly 2,500 pages of Google Search’s internal API documentation have been leaked. This information offers novel insight into how Google Search works and breaks down some big mysteries about how the most popular search engine presents the web to its users.

Rand Fishkin, an SEO expert with over a decade of experience in the field, shared the documentation. Fishkin reports he got the API documentation from a source who claimed that the leaked documents were confirmed as authentic by ex-Google employees who also shared additional, private information about Google’s search operations with the source.

Fishkin reviewed the leaked documents in a video call with the source, who initially requested anonymity. However, since then, the source has revealed their identity as Erfan Azimi, an SEO practitioner and founder of EA Eagle Digital. Fishkin also consulted some ex-Google employee friends and Mike King, founder of iPullRank, to verify the documents and claims that they appear to be a “legitimate set of documents from inside Google’s Search division.”

According to Fishkin, the documents include details on how Google’s search API works and what information is available to employees. This includes what kind of data Google collects and uses, which sites Google promotes for sensitive topics (including elections), how small websites are handled, and more. What’s more alarming is that the information appearing in the documents seems to conflict with what Google employees have been sharing with the rest of the world.

In the News: Scammers launch AFF phishing campaign targeting North America

API documentation reveals Google’s dishonesty

“‘Lied’ is harsh, but it’s the only accurate word. While I don’t necessarily fault Google’s public representatives for protecting their proprietary information, I do take issue with their efforts to actively discredit people in the marketing, tech, and journalism worlds who have presented reproducible discoveries,” says King in his breakdown of the leaked documents.

His report adds that 2,596 modules are represented in the API documentation, with 14,014 attributed (ranking features). These modules are related to components of YouTube, Assistant, Books, video search, links, web documents, crawl infrastructure, an internal calendar system, and the People API.

As mentioned before, the API documentation reveals that much of the information that Google spokespersons have revealed over the years is incorrect. For example, the claim that Google doesn’t have anything like domain authority, made repeatedly by multiple Google spokespersons, seems false. The documents reveal a feature that Google computes dubbed “siteAuthority,” which is stored as part of the Compressed Quality Signals on a per-document basis.

Another popular claim by Google employees refuted by the leaked documentation is that clicks don’t affect rankings. After providing several examples of Google employees repeatedly saying clicks don’t affect ranking, King draws attention to the revealing of the Glue and NavBoost ranking systems, which came to light during Pandu Nayak’s testimony in the Google DoJ antitrust trial.

NavBoost has a specific module that focuses entirely on click signals. The module is summarised as “click and impression signals for Craps”, one of Google’s ranking systems and tracks bad clicks, good clicks, last longest clicks, unsquashed clicks, and unsquashed longest clicks — all being considered as metrics. Other major claims that the leaked documents refute include Google’s claims that there’s no sandboxing for new sites and that data from Chrome isn’t used for ranking.

It’s important to note that the leak’s contents don’t necessarily prove that Google uses the specific data and signals mentioned to rank results. That said, the company hasn’t responded to the leak at the moment, including to The Verge’s direct request to refute its legitimacy.

In the News: Scammers launch AFF phishing campaign targeting North America

Yadullah Abidi

Yadullah Abidi

Yadullah is a Computer Science graduate who writes/edits/shoots/codes all things cybersecurity, gaming, and tech hardware. When he's not, he streams himself racing virtual cars. He's been writing and reporting on tech and cybersecurity with websites like Candid.Technology and MakeUseOf since 2018. You can contact him here: