Photo: Robert Way / Shutterstock.com

About 190 personal photos of Australian children from the LAION-5B dataset are being used to train artificial intelligence (AI) tools without the knowledge or consent of the children of their families. This misuse of data has created malicious deepfakes, posing significant risks to children’s privacy and safety.

Just last month, it was reported that 170 personal photos of Brazilian children were exploited for AI training from the same dataset.

Human Rights Watch (HRW) research reveals that images are being collected from online sources to create a comprehensive database. This collection includes photographs of minors from different regions across Australia. The database is subsequently employed to develop and enhance widely-used artificial intelligence systems.

These AI tools can produce convincing but potentially dangerous visual content, including images and videos. As a result, children may be vulnerable to various forms of exploitation and misuse through this technology.

Human Rights Watch discovered 190 photos of Australian children within LAION-5B. However, this number likely represents a significant undercount, as the review covered less than 0.0001 per cent of the 5.85 billion images in the dataset.

“Children should not have to live in fear that their photos might be stolen and weaponised against them,” said Hye Jung Han, children’s rights and technology researcher at HRW.

The photos span various stages of childhood, capturing intimate and private moments originally shared on personal blogs, school websites, and photo-sharing platforms, often with privacy settings that should have prevented such exposure.

Researchers found one alarming example involving a photo captured from YouTube of two young boys from a Perth preschool, identified by their full names and school in the caption. Such detailed information raises serious concerns about how easily children’s identities can be traced.

This is an image of privacy featured 121 — *With the rise of generative AI, privacy concerns are also increasing.*

YouTube has been notified that the videos are being used for unauthorised scrapping. “We have been clear that the unauthorised scrapping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse,” Jack Malon, YouTube spokesperson, told Ars Technica.

HRW also noticed the presence of First Nations children, highlighting cultural sensitivities and the potential for cultural harm. These images, some of which feature children with traditional body paint or engaging in cultural activities, underscore the broader implications of data misuse.

The misuse of these images extends beyond their initial unauthorised collection. As experts have discovered, AI models trained on LAION-5B have been shown to leak private information, reproducing identical copies of the training data, including sensitive personal photos. This flaw poses ongoing risks as these models cannot ‘forget’ the data they have ingested, even if it is later removed from the dataset.

In response to these findings, the HRW has called on the Australian government to enact stringent laws to protect children’s data from AI misuse. Although LAION, the organisation managing the dataset, has pledged to remove the identified photos, they also placed the responsibility on children and their guardians to remove personal photos from the internet.

“Generative AI is still a nascent technology, and the associated harm children are already experiencing is not inevitable,” explained Han. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

In the News: Three FakeBat campaigns exploit drive-by flaws for distribution