March 15, 2018 By Brad Harris 4 min read

This is the first installment in a two-part series about generative adversarial networks (GANs). For the full story, be sure to also read part two.

GANs are one of the latest ideas in artificial intelligence (AI) that have advanced the state of the art. But before we dive into this topic, let’s examine the meaning of the word “adversarial.” In its original application in AI, this word refers to an example type that is designed to fool an evaluating neural net or another machine-learning model. With the use of machine learning in security applications increasing, this example type has become very important.

Imagine documents with headers that include either terminating tags, such as HTML, or document lengths, such as rich text formats (.rtf) or .doc file formats. Because these files can have arbitrary bytes appended to the end, this gives rise to file space, which could be used to create these adversarial examples.

Right now, the state of the art has focused on images, but it might apply to other file formats as well. In theory, these formats may be even more vulnerable since an image must be changed only slightly to make sure it is still recognizable to humans. Other formats will appear identical — even if there is extra content at the end. This gives rise to several different attacks (and defenses) against these examples, which are described in more detail in a paper by researchers from the University of Virginia, “Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks.”

What Are Generative Adversarial Networks?

According to O’Reilly Media, generative adversarial networks are “neural networks that learn to create synthetic data similar to some known input data.” These networks use a slightly different definition of “adversarial” than the one described above. In this case, the term refers to two neural networks — a generator and a discriminator — competing against each other to succeed in a game. The object of the game is for the generator to fool the discriminator with examples that look similar to the training set. This idea was first proposed in a research paper, “Generative Adversarial Nets,” by Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio.

When the discriminator rejects an example produced by the generator, the generator learns a little more about what the good example looks like. Note that the generator must start with some sort of probability distribution. This is often just the normal distribution, making the GAN very practical and easy to initialize. If the generator can learn more about the real examples, it can choose a better probability distribution. Typically, the discriminator acts as a binary classifier — that is, it says “yes” or “no” to an example. The fact that there are only two options for the discriminator to choose simplifies the architecture and makes GANs practical.

How does the generator get closer to the real examples? With each attempt, the discriminator sends a signal back to the generator to tell it how close it is to an actual example. Technically, this is the gradient of the difference, but you can think of it as a proximity/quality and directionality indicator. In other words, the discriminator leaks information about just how close the generator was and how it should proceed to get closer. In an ideal situation, the generator will eventually produce examples that are as good as the discriminator is at distinguishing between the real and generated examples.

Semisupervised Learning

The discriminator is given samples from the training set and the generator. When training, it labels inputs as 1 — typically with a smoothing factor that makes values close to 1 positive — and labels the generator images as 0. This is how the discriminator initializes itself. It then assumes that any image from the generator is fake, which is how it creates the binary training set.

In a practical sense, each half of the network trains at the same time, meaning that each half initializes with no knowledge at all. However, the discriminator has access to the knowledge buried in the training set while the generator can only adjust based on the initially flawed indicator returned by the discriminator. This works because, in the beginning, the generator creates what can be called noise — examples that are so fake that they don’t resemble the real examples at all. Therefore, the discriminator can safely say that any example it receives from the generator is fake.

This is technically called semisupervised learning. In semisupervised learning, the algorithm (discriminator) has one set of examples labeled as truth and one set that is not. In this case, the discriminator knows that the training set contains real examples, but it cannot know for sure that the initial examples sent by the generator are not very close to the real ones. It can only assume that the output is noise because the generator has very little knowledge of what the real examples should look like.

Given an extremely accurate probability distribution, it’s possible for the generator to quickly create convincingly realistic examples. However, this defeats the purpose of GANs because if one already knows the detailed probability distribution, there are much simpler and more direct methods available to derive realistic examples.

As time goes by, the discriminator learns from the training set and sends more and more meaningful signals back to the generator. As this occurs, the generator gets closer and closer to learning what the examples from the training set look like. Once again, the only inputs the generator has are an initial probability distribution (often the normal distribution) and the indicator it gets back from the discriminator. It never sees any real examples.

Stay Tuned to Learn More

This process may seem impractical in the real world, but there are many scenarios in which GANs can help solve very practical problems. In the second part of this series, we will explore how this emerging development in AI can be applied to cybersecurity to perform fundamental processes, such as password cracking, and complex tasks, such as spotting information hidden in generated images.

More from Artificial Intelligence

Generative AI security requires a solid framework

4 min read - How many companies intentionally refuse to use AI to get their work done faster and more efficiently? Probably none: the advantages of AI are too great to deny.The benefits AI models offer to organizations are undeniable, especially for optimizing critical operations and outputs. However, generative AI also comes with risk. According to the IBM Institute for Business Value, 96% of executives say adopting generative AI makes a security breach likely in their organization within the next three years.CISA Director Jen…

Self-replicating Morris II worm targets AI email assistants

4 min read - The proliferation of generative artificial intelligence (gen AI) email assistants such as OpenAI’s GPT-3 and Google’s Smart Compose has revolutionized communication workflows. Unfortunately, it has also introduced novel attack vectors for cyber criminals. Leveraging recent advancements in AI and natural language processing, malicious actors can exploit vulnerabilities in gen AI systems to orchestrate sophisticated cyberattacks with far-reaching consequences. Recent studies have uncovered the insidious capabilities of self-replicating malware, exemplified by the “Morris II” strain created by researchers. How the Morris…

Open source, open risks: The growing dangers of unregulated generative AI

3 min read - While mainstream generative AI models have built-in safety barriers, open-source alternatives have no such restrictions. Here’s what that means for cyber crime.There’s little doubt that open-source is the future of software. According to the 2024 State of Open Source Report, over two-thirds of businesses increased their use of open-source software in the last year.Generative AI is no exception. The number of developers contributing to open-source projects on GitHub and other platforms is soaring. Organizations are investing billions in generative AI…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today