Over 170 personal photos and associated metadata of Brazilian children have been used to train a large artificial intelligence (AI) dataset called LAION-5B. This has raised serious concerns about privacy violations and potential exploitation. These photos were disclosed when researchers reviewed only 0.0001 percent of the 5.85 billion images in the dataset, indicating a significant undercount and the possibility that many more images could be found.

This dataset is extensively utilised for AI algorithm training and includes recognisable images of Brazilian children, often accompanied by detailed metadata such as names, captions, and original image URLs. Compiling this dataset through web scraping techniques brings to light grave concerns regarding the privacy and security of these children, as they unwittingly become subjects in the digital domain.

Human Rights Watch’s analysis identified 170 photos of children from at least 10 different states across Brazil. These photos, capturing various stages of childhood, from births to school events to family gatherings, paint an intimate portrait of these young lives. What adds to the gravity of the situation is the revelation that many of these photos were not publicly accessible through regular online searches, indicating a breach of privacy from personal blogs and photo-sharing platforms.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” said Hye Jung Han, children’s rights and technology researcher and advocate at Human Rights Watch.

The ramifications of incorporating these personal photos into the AI system are profound. AI models trained on such data sets are prone to leaking sensitive information, including medical records and personal photographs. This creates a perilous situation where malicious actors can exploit these models to generate convincing deep fakes, manipulating the children’s images and potentially subjecting them to exploitation and harm.

Researchers found an alarming use of the AI tools trained on LAION-5B to generate explicit deepfakes depicting girls from different Brazilian states. This action infringes upon their privacy and subjects them to potential harassment and online exploitation, illustrating the tangible consequences of misusing such technology.

This is an image of privacy security featured2893 — *As AI tools are rapidly deployed in several workflows, there is a rising trend of generating children’s deep fakes using these tools, which can affect their security and privacy.*

According to Human Rights Watch, at least 85 girls from different Brazilian cities have reported harassment by their classmates who took the photos and used AI tools to create a deepfake.

In response to these revelations, LAION, the organisation overseeing the data set, has pledged to remove the children’s photos. However, questions linger regarding the adequacy of existing data protection laws, such as Brazil’s General Personal Data Protection Law, in safeguarding children’s privacy in the digital age.

Researchers urge the Brazilian government to pass strict laws explicitly prohibiting nonconsensual digital replication or manipulation of children’s likenesses.

“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” Han said. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

In October, reports revealed that AI-generated images of children are proliferating on the internet. Furthermore, facial recognition search engine PimEyes banned underage searches.

In April 2024, reports emerged detailing AI NSFW advertisements on Meta platforms including Facebook, Instagram, and Messenger.

Moreover, privacy advocacy group NYOB sued Microsoft for tracking student behaviour via its 365 Education service.

In the News: Massive security breach exposes 3.6 million GitHub files of NYT