How data poisoning attacks can corrupt machine learning models
Bohitesh Misra
Member at Digital Futurists Angels Private Limited
Data poisoning can render machine learning models inaccurate, possibly resulting in poor decisions based on faulty outputs. With no easy fixes available, security professionals must focus on prevention and detection.
Machine learning adoption have exploded recently, driven in part by the rise of cloud computing, which has made high performance computing, network and storage more accessible to all businesses. As data scientists and product companies integrate machine learning into various products across industries, and users rely on the output of its algorithms in their decision making, security experts warn of adversarial attacks designed to abuse the technology.
Most social networking platforms, online video platforms, large e-Commerce sites, search engines and other services have some sort of recommendation engine system based on machine learning algorithms. The movies and shows that people like on Netflix, the content that people like or share on Facebook, the hashtags and likes on Twitter, the products consumers buy or view on Amazon, the queries users type in Google Search are all fed back into these sites' machine learning models to make better and more accurate recommendations. Various attack methods, including fooling a model into incorrectly classifying an input, obtaining information about the data that was used to train a model.
Recommendation algorithms and other similar systems can be easily hijacked and manipulated by performing actions that pollute the input to the next model update. For instance, if you want to attack an online shopping site to recommend product B to a shopper who viewed or purchased product A, all you have to do is view A and then B multiple times or add both A and B to a wish list or shopping basket. If you want a hashtag to trend on a social network, simply post and/or retweet that hashtag a great deal. If you want that new fake political account to get noticed, simply have a bunch of other fake accounts follow it and continually engage with its content.
The user refers to retweeting the content continuously. This could mean the user in question controls many Twitter accounts. However, Twitter allows users to retweet, undo, and then retweet again. Re-retweeting serves to “bump” the content over and over again. Some Twitter accounts even use this tactic on their own content to fish for more visibility.
Recommendation algorithms can be attacked in a variety of ways, depending on the motive of the attacker. Adversaries can use promotion attacks to trick a recommender system into promoting a product, piece of content, or user to as many people as possible. They can perform demotion attacks in order to cause a product, piece of content, or user to be promoted less than it should. Algorithmic manipulation can also be used for social engineering purposes. In theory, if an adversary has knowledge about how a specific user has interacted with a system, an attack can be crafted to target that user with a recommendation such as a YouTube video, malicious app, or imposter account to follow. As such, algorithmic manipulation can be used for a variety of purposes including disinformation, phishing, scams, altering of public opinion, promotion of unwanted content, and discrediting individuals or brands. You can even pay someone to manipulate Google’s search autocomplete functionality.
Numerous attacks are already being performed against recommenders, search engines, and other similar online services. In fact, an entire industry exists to support these attacks. With a simple web search, it is possible to find inexpensive purchasable services to manipulate app store ratings, post fake restaurant reviews, post comments on websites, inflate online polls, boost engagement of content or accounts on social networks, and much more. The prevalence and low cost of these services indicates that they are widely used.
It's not news that attackers try to influence and skew these recommendation systems by using fake accounts to upvote, downvote, share or promote certain products or content. Users can buy services to perform such manipulation on the underground market as well as "troll farms" used in disinformation campaigns to spread fake news.
What is data poisoning?
Data poisoning or model poisoning attacks involve polluting a machine learning model's training data. Data poisoning is considered an integrity attack because tampering with the training data impacts the model's ability to output correct predictions.
The difference between an attack that is meant to evade a model's prediction or classification and a poisoning attack is persistence: with poisoning, the attacker's goal is to get their inputs to be accepted as training data. The length of the attack also differs because it depends on the model's training cycle; it might take weeks for the attacker to achieve their poisoning goal.
Data poisoning can be achieved either in a blackbox scenario against classifiers that rely on user feedback to update their learning or in a whitebox scenario where the attacker gains access to the model and its private training data, possibly somewhere in the supply chain if the training data is collected from multiple sources.
Data poisoning examples
In a cybersecurity context, the target could be a system that uses machine learning to detect network anomalies that could indicate suspicious activity. If an attacker understands that such a model is in place, they can attempt to slowly introduce data points that decrease the accuracy of that model, so that eventually the things that they want to do won't be flagged as anomalous anymore. This is also known as model skewing.
A real-world example of this is attacks against the spam filters used by email providers. In practice, we regularly see some of the most advanced spammer groups trying to throw the Gmail filter off-track by reporting massive amounts of spam emails as not spam. Between the end of Nov 2017 and early 2018, there were at least four malicious large-scale attempts to skew the classifier.
Another example involves Google’s VirusTotal scanning service, which many antivirus vendors use to augment their own data. While attackers have been known to test their malware against VirusTotal before deploying it in the wild, thereby evading detection, they can also use it to engage in more persistent poisoning. In fact, in 2015 there were reports that intentional sample poisoning attacks through VirusTotal were performed to cause antivirus vendors to detect benign files as malicious.
No easy fix
The main problem with data poisoning is that it's not easy to fix. Models are retrained with newly collected data at certain intervals, depending on their intended use and their owner's preference. Since poisoning usually happens over time, and over some number of training cycles, it can be hard to tell when prediction accuracy starts to shift.
Reverting the poisoning effects would require a time-consuming historical analysis of inputs for the affected class to identify all the bad data samples and remove them. Then a version of the model from before the attack started would need to be retrained. When dealing with large quantities of data and a large number of attacks, however, retraining in such a way is simply not feasible and the models never get fixed. Some practical solutions for machine unlearning are still years away, so the solution at this point is to retrain with good data and that can be super hard to accomplish or expensive.
Prevent and detect
Data scientists and developers need to focus on measures that could either block attack attempts or detect malicious inputs before the next training cycle happens, like input validity checking, rate limiting, regression testing, manual moderation and using various statistical techniques to detect anomalies.
For example, restrictions can be placed on how many inputs provided by a unique user are accepted into the training data or with what weight. Newly trained classifiers can be compared to previous ones to compare their outputs by rolling them out to only a small subset of users. Recommendation is to build a golden dataset that any retrained model must accurately predict, which can help detect regressions.
Data poisoning is just a special case of a larger issue called data drift that happens in systems. Everyone gets bad data for a variety of reasons, and there is a lot of research on how to deal with data drift as well as tools to detect significant changes in operational data and model performance, including by large cloud computing providers. Azure Monitor and Amazon SageMaker are examples of services that include such capabilities.
To perform data poisoning, attackers also need to gain information about how the model works, so it's important to leak as little information as possible and have strong access controls in place for both the model and the training data. A lot of security in AI and machine learning has to do with very basic read/write permissions for data or access to models or systems or servers.
Just as organizations run regular penetration tests against their networks and systems to discover weaknesses, they should expand this to the machine learning context, as well as treating machine learning as part of the security of the larger system or application.
Developers should do with building a model is to actually attack it themselves to understand how it can be attacked and by understanding how it can be attacked, they can then attempt to build defenses against those attacks.
Why it’s so hard to fix a poisoned model
If the owner of an online shop notices that their site has started recommending product B alongside product A, and they’re suspicious that they’ve been the victim of an attack, the first thing they need to do is look through historical data to determine why the model started making this recommendation. To do this, they need to gather all instances of product B being viewed, liked, or purchased alongside product A. Then they need to determine whether the users that generated those interactions look like real users or fake users – something that is probably extremely difficult to do if the attacker knows how to make their fake accounts look and behave like real people.
Fixing a poisoned model, in most cases, involves retraining. You take an old version of the model, and train it against all accumulated data between that past date and the present day, but with the malicious entries removed. You then deploy the fixed model into production and resume business. If at some point in the future you discover a new attack, you’ll need to perform the same steps over again. Social networks and other large online sites are under attack on numerous fronts, on an almost constant basis.
When considering social networks, detecting poisoning attacks is only part of the problem that needs to be solved. In order to detect that users of a system are intentionally creating bad training data, a way of identifying accounts that are fake or specifically coordinating to manipulate the platform is also required.
I can conclude by reiterating that threats arising from the manipulation of recommenders, especially those used by social networks hold broad societal implications. It is widely understood that algorithmic manipulation has led to entirely false stories, conspiracy theories, and genuine news pieces with altered figures, statistics, or online polls being circulated as real news.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.