Photo: Bluecat_stock / Shutterstock.com
Using publicly available data to train AI models isn’t anything new, with the most notable example being ChatGPT creator OpenAI doing the same for its GPT-3.5 and GPT-4 AI models. However, learning from past mistakes, OpenAI did license at least a portion of its datasets to avoid copyright infringements down the line.
While publicly available data is free for everyone to access and use, training AI and ML models pose a few challenges that can get tricky to solve. With X already going through a barrage of changes in the year since Elon Musk’s controversial $44 billion takeover, collecting information and using it to train AI models might not be the best way forward.
Based on your consent, we may collect and use your biometric information for safety, security, and identification purposes.
This also doesn’t include why X wants this data, but one possibility can be passwordless sign-ins. The platform also plans to roll out passkeys support, letting users sign into their accounts with their device’s fingerprint reader, facial recognition, or a PIN code. It has already caused privacy concerns among users.
And here’s what X collects on you regarding job applications, recommendations and employment history in general.
We may collect and use your personal information (such as your employment history, educational history, employment preferences, skills and abilities, job search activity and engagement, and so on) to recommend potential jobs for you, to share with potential employers when you apply for a job, to enable employers to find potential candidates, and to show you more relevant advertising.
Shortly after, Alex Ivanovs from Stackdiary found another policy change in section 2.1 stating.
We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy.
He further pointed out a tweet (now called a post) where Elon’s asking journalists that if they want “more freedom to write and a higher income”, they should publish directly on X. This is essentially a call for creators to publish helpful information exclusively on X so it can be further used to train its (and its subsidiaries) models.
Will other Musk-headed companies benefit from X’s dataset?
Most likely, yes.
Musk’s latest startup, an AI company called xAI, will use X’s data for training its “maximally curious” AI systems and products — something the multi-billionaire already confirmed in a Twitter Space when sharing more information about the upcoming venture.
Interestingly, xAI will also be collaborating with Tesla when it comes to both hardware and AI-related software. This isn’t anything new as several other Musk-headed companies, including Tesla, SpaceX and The Boring Company, have been in business with each other in the past, with some of these transactions coming to light in Tesla’s 2020 filing with the US Securities and Exchange Commission.
Musk has also accused “every AI organisation on Earth” of using Twitter’s data for training and “in all cases illegally.” He hasn’t cited any evidence for his claims, but X has since been implementing rate limits to prevent the platform from “being scraped like crazy.”
All of this was more or less used as justification for xAI also using the micro-blogging (soon to be the “everything app”) platform’s data for training. Musk has been careful to state that the company will only use public tweets and nothing private “just like everyone else has.”
As the pieces fall into place, you’ll notice a clear link between all companies under Musk’s purview. X will use its users’ data for training AI models, and so will xAI — which is already in cahoots with Tesla. By extension, it’s not very difficult for any of Musk’s companies to have access to training data and likely other, more sensitive information collected by X.
Should you be worried?
How you interact with others on the platform, such as people you follow and people who follow you, metadata related to Encrypted Messages, and when you use Direct Messages, including the contents of the messages, the recipients, and date and time of messages.
Collecting some data is unavoidable, but will this metadata be used to train X’s AI models? Well, Musk says no.
Meanwhile, on Friday, X updated its terms of service to ban scraping or crawling data to prevent AI models from training on the data.
Additionally, Ivanovs also highlighted that according to the updated terms, there’s a chance that users’ access to certain content might get limited or even cut off. Users might also find it harder to see their content from a wider audience.
If this sounds like a shadow ban, you’re right. It’s basically the same concept, however, with AI and automation coming into the picture, it will get much more complex and prevalent.
Musk has done pretty much what he wanted to do with the platform, pushing changes, slashing jobs, and even changing the platform’s name and logo with no regard for opposition. With the company now training AI models on your posts, sharing your data with more third parties and collecting more information in the first place, it’s time to be more careful than ever about what data you put in X’s hands.