New developments have come to light in an ongoing copyright case by book authors against Meta, alleging that the social media giant trained its AI models on pirated books. Meta had already admitted to torrenting a large dataset of tens of millions of pirated books called LibGen, totalling around 80 TB.
New evidence was revealed on February 6 as Meta’s redacted emails were made public. These emails show that Meta had torrented at least 81.7 TB of data from Anna’s Archive, a site with a vast collection of “shadow libraries” containing pirated books. This data includes at least 35.7 TB of data from Z-Library and LibGen.
Meta’s torrenting had come to light because the company seeded, hence distributing the pirated books in the dispute. Book authors had pressed Meta before to reveal more information on the torrenting, but the company has been protected so far by a court order that denied authors’ requests to review Meta’s torrenting and sending data.

Meta did try to conceal the seeding by not using Facebook servers when downloading the dataset to avoid being tracked. However, in the now public emails Nikolay Bashlykov, a Meta research engineer, wrote in an April 2023 message that “torrenting from a corporate laptop doesn’t feel right,” going as far as to express concerns about using Meta IP addresses to load through torrents pirate content.
By September 2023, Bashlykov had again emphasised in an email that using torrents would involve seeding, which essentially shares the downloaded content outside due to how torrents work. Further adding that “this could be legally not OK.”
These emails have blown the case open, and the authors now allege that Meta knew the torrenting was illegal. Given Bashlykov’s warnings, the authors also allege that Meta hid its seeding to the best of its abilities while downloading terabytes of data as recently as April 2024.
The authors want the Meta staff involved in LibGen’s torrenting to be deposed again, as the new evidence contradicts the prior testimony. On the other hand, Meta is not fighting the seeding aspect of the copyright infringement claims and has maintained that training its AI models on LibGen was fair use.
In the News: New malware uses GTM to skim credit cards from Magento stores