To combat fake news, a team of Massachusetts Institute of Technology (MIT) researchers have developed a new Machine Learning (ML) system that helps to determine whether a news source is accurate or biased.
The researchers believe that the best approach is to focus not on the factuality of individual claims but the news sources themselves.
“If a website has published fake news before, there’s a good chance they’ll do it again. By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place,” said lead author Ramy Baly from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL).
The system needs only about 150 articles to detect if a news source can be trusted, suggests the study to be presented at the 2018 Empirical Methods in Natural Language Processing (EMNLP) conference in Brussels.
In the News: Facebook prepares to fight ‘fake news’ as India gears up for 2019 elections
For the study, the researchers from MIT and the Qatar Computing Research Institute (QCRI), took data from Media Bias/Fact Check (MBFC) — a website with human fact-checkers who analyse the accuracy and biases of more than 2,000 news sites, from MSNBC and Fox News to low-traffic content farms.
The team then fed that data to an ML algorithm called a Support Vector Machine (SVM) classifier, and programmed it to classify news sites the same way as MBFC.
When given a new news outlet, the system was 65 per cent accurate at detecting whether it has a high, low or medium level of “factuality”, and roughly 70 per cent accurate in detecting if it is left-leaning, right-leaning or moderate.
The team determined that the most reliable ways to detect both fake news and biased reporting were to look at the common linguistic features across the source’s stories, including sentiment, complexity and structure.
For example, fake news outlets were found to be more likely to use language that is hyperbolic, subjective and emotional.
Concerning bias, left-leaning outlets were more likely to have language that related to concepts of harm/care and fairness/reciprocity, compared to other qualities such as loyalty, authority and sanctity.
Also read: If Google says a website isn’t secure, what does it really mean?