Adversarial machine learning[1] is a field of study focused on the interaction between machine learning systems and their potentially harmful adversaries. It has evolved and expanded over time with the contributions of numerous researchers. This discipline explores how malicious entities can exploit and manipulate machine learning processes, often with the aim of evading detection or causing misclassification. It covers a wide array of attack types, from obfuscating spam messages to manipulating autonomous vehicle systems. Importantly, this field is not solely focused on identifying and understanding these threats, but also on developing and implementing robust defense strategies. These can include various approaches such as multi-step countermeasures, noise detection methods, and techniques for evaluating the impact of attacks. The ongoing research and exploration in this field are pivotal for ensuring the security[2] and reliability of machine learning systems.
Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications.
Most machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID). However, this assumption is often dangerously violated in practical high-stake applications, where users may intentionally supply fabricated data that violates the statistical assumption.
Most common attacks in adversarial machine learning include evasion attacks, data poisoning attacks, Byzantine attacks and model extraction.