Why Adversarial Examples Are Such a Dangerous Threat to Deep Learning

March 27, 2020
| |
4 min read

Technologies like artificial intelligence (AI) and neural networks are driven by deep learning — machine learning algorithms that get “smarter” with more data. The deepfake, a severe cybersecurity threat, wouldn’t be possible without deep learning.

Deepfakes aside, we need to be aware that several machine learning models, including state-of-the-art neural networks, are vulnerable to adversarial examples. The threat to the enterprise can be critical.

What is an adversarial example, and why should we care? These machine learning models, as intelligent and advanced as they are, misclassify examples (or inputs) that are only marginally different from what they normally classify correctly. For instance, by modifying an image ever so slightly — even altering just one pixel in some cases — image recognition software can be defeated.

With more companies relying on deep learning to process data than ever before, we need to be more aware of these types of attacks. The underlying strategies behind adversarial attacks are fascinating. Colorful toasters are even involved.

Adversarial Attacks 101

Luba Gloukhova, founding chair of Deep Learning World and editor-in-chief of the Machine Learning Times, finds herself at the hub of the machine learning and deep learning industries. I met Gloukhova, an independent speaker and consultant, at a tech conference in February, where she told me that the more she understands about the capabilities of deep learning, the more potential security risks we are exposing ourselves to become apparent.

“As I saw some of the potential shortcomings of this technology, it got me venturing down this path of adversarial attacks on adversarial examples, and it got me really interested in this industry,” Gloukhova said.

According to Gloukhova, an adversarial attack is one in which inputs to a deep learning neural network ultimately result in unexpected outputs. The example here is the input itself.

“The adversarial examples are generated by a slight modification to the input to a network which is trained to recognize it,” she explained. “The slight modification in one way or another creates completely unexpected results and unexpected outputs.”

This unexpected output exposes vulnerabilities in the architectures of deep neural networks because a bad actor can craft the adversarial examples to disguise malicious content as legitimate to the human eye. One fundamental analogy she often uses is that of a hacker trying to spread malicious code, where the code (input) could fly under the radar of most recognition systems.

“This adversarial attack could contain malicious content that’s hidden because of slight modifications that make it look as if it’s not problematic,” she said.

Toaster? Banana? Other?

Back to that colorful toaster. Back in 2017, at the 31st Conference on Neural Information Processing Systems in Long Beach, a group of data scientists presented evidence of their toaster theory. By placing a sticker of a toaster nearby or on an object (input) for image recognition software, the software (or neural network) would classify it as a toaster. In their experiment, a banana was used as the input.

On its own, it gets classified as a banana with high accuracy. But as soon as we add the toaster sticker, the network thinks it’s a toaster. “It misleads the classification model, and what’s interesting is that the authors of the paper have shown that it is successful in attacking models that have completely different architectures and were trained with completely different differences,” said Gloukhova.

What’s most concerning about these types of adversarial attacks is that the bad actor doesn’t need any information about the model it’s attacking. Fortunately, we’re not highly reliant on this technology at this point.

Threats Well Beyond Image Recognition

Still, Gloukhova warns that the computer security threat is significant, and she doesn’t want to paint a picture that only image recognition systems are vulnerable. In fact, a research paper from Cornell University demonstrates how essentially the same adversarial attacks could be applied to natural language processing systems based on deep neural networks. In the experiment, they were able to take any audio waveform and produce another that is over 99.9 percent similar and transcribe it as any phrase of their choosing.

As our reliance on autonomous vehicles grows, deep neural networks will be a critical component of the transportation ecosystem. This network, not unlike other deep learning networks, is susceptible to adversarial examples. According to Gloukhova, a serious threat has been outlined in research papers that demonstrate how by simply placing some inconspicuous spray paint or a particular sticker on a stop sign, the vehicle may fail to classify it.

“The deep neural network that resides within that vehicle’s recognition system could make it misinterpret the actual stop sign as, say, a yield sign or a speed limit sign,” she warns. “So my goal is to communicate that these adversarial examples do exist. They are a vulnerability of these deep neural networks. We need to be aware of the limits of this technology, especially as our fascination with it and our reliance on it grows.”

We are likely a long way away from this scenario playing out at scale in the real world, but planning to counter these threats now can better position us for safety.

How the Enterprise Can Respond

So what does all this mean for the enterprise? On the surface, the issue can seem too complex to get our minds around, but some of the strategies Gloukhova promotes revolve more around mindset and culture than technology. Of course, getting into the developmental and architectural weeds will still be necessary.

“By leveraging these complex architectures and huge training data sets, we’ve chosen to opt for models and model architectures that make them open to adversarial attacks,” she said. “A big takeaway for this is the overall awareness of this vulnerability.”

She hopes that developers and executives will be more cautious about the kinds of activation functions and neural network architectures they choose, which may come down to using smaller training data sets or adopting less fancy architectures.

Gloukhova recommended developer tactics like non-linear activation functions, adversarial training, JPEG compression, defensive distillation and gradient masking. However, these technical solutions are not so simple, and they require a lot more computing power and some sacrifices.

One of the top priorities for any organization should be to hire the right people for deep learning. “It does take the kind of people that are aware of these threats and are aware of the countermeasures,” Gloukhova said. “I’m excited to spread the word, especially toward the C-Suite. They need to see the value in this kind of knowledge so it can trickle down through the organization to the developers, and maybe it will create some lasting change.”

Mark Stone

Mark Stone is a Hubspot-certified content marketing writer specializing in technology, business, and entertainment. He is a regular contributor to Forbes Bra...
read more