This is the second installment in a two-part series about generative adversarial networks. For the full story, be sure to also read part one.
Now that we’ve described the origin and general functionality of generative adversarial networks (GANs), let’s explore the role of this exciting new development in artificial intelligence (AI) as it pertains to cybersecurity.
PassGAN: Cracking Passwords With Generative Adversarial Networks
Perhaps the most famous application of this technology is described in a paper by researchers Briland Hitaj, Paolo Gasti, Giuseppe Ateniese and Fernando Perez-Cruz titled “PassGAN: A Deep Learning Approach for Password Guessing,” the code for which is available on GitHub.
In this project, the researchers first used a GAN to test against password cracking tools John the Ripper and HashCat, and then to augment the guessing rules of HashCat. The GAN was remarkably successful: It was trained on 9.9 million unique leaked passwords — 23.7 million including duplicates — which represented real human output. This is a rare example of a security application of GANs that does not involve images.
According to the paper, PassGAN did twice as well as John the Ripper’s Spider Lab rule set and was competitive with the best64 and gen2 rule sets for HashCat. However, the authors noted that they generated the best results when they applied PassGAN as an augmentation to HashCat — the combination cracked 18 to 24 percent more passwords than HashCat alone. This is indeed an amazing result. If HashCat were able to crack 1 million passwords from a data breach, the augmentation would add another 180,000 to 240,000 passwords to the cracked set. This is not unrealistic given the massive size of many data breaches we’ve seen in the past.
What’s more, the authors claimed that their technique is capable of guessing passwords not covered by rules. This is because the generator of PassGAN learned the password distribution of the training set. It learned more human patterns and generated passwords that are close to those human-generated patterns. This means that PassGAN learns things that a typical password cracker would never catch.
It’s important to note that the authors set the maximum password length in the training data and guessing to 10 characters. I would like to see the same experiments run with longer passwords: At the time of this writing, 13 characters is widely considered to be a necessity for strong passwords.
This project is also interesting because it generates text as output. Many of these problems are based around image recognition and manipulation, as we will see as we examine another paper that describes the use of GANs to generate secure steganography.
SSGAN: Applying GANs to Steganography
Stegonography is the process of hiding information in otherwise normal-looking files. For example, changing the least significant bit in each RGB pixel value of an image would allow information to leak without ruining the image for human perception. Statistically, however, these images are easy to detect.
A paper from the Chinese Academy of Sciences titled “SSGAN: Secure Steganography Based on Generative Adversarial Networks” described researchers’ attempts to use GANs to create stegonographic schemes. The SSGAN method improved upon earlier work in the field that used another, less performant strategy.
This experiment used one generator and, unlike the PassGAN project, two discriminators. Here, the generator’s job is to attempt to create images that are well-suited to hide information, meaning images that are both visually consistent and resistant to steganalysis methods. These are called secure cover images.
The discriminators do two things: One involves a GAN-based steganalysis framework, which the authors claimed to be more sophisticated than those used in previous research. The second “competes” against the generator to encourage diversity within the created images — that is, it attempts to assess the visual quality of the proposed image. This way, the generator does not continue to produce noisy images. Instead, it receives feedback telling it which images are more suitable visually. The second discriminator attempts to determine the images’ suitability for steganography.
Experimental results showed that, using the SSGAN architecture, the classification error of the steganalysis network increased, meaning that their generated stegonographic images were better for hiding information. The dual discriminator architecture was successful in causing the generator to produce not only more steganalysis-resistent images, but also images of greater visual quality. This is a very big win in the field of steganography because it beat out other heuristic-based algorithms.
The Tip of the Iceberg
Overall, these two projects proved that GANs of various architectures show promise in the field of cybersecurity. PassGAN demonstrated that GANs can be applied to fundamental security-related tasks, such as cracking passwords, and can improve and advance the state of the art. SSGAN showed that GANs can handle extremely complex tasks, such as finding information hiding in high-quality generated images that are resistant to steganalysis.
These projects are only the tip of the iceberg. As GANs are applied to more cybersecurity-related tasks, they will no doubt prove extremely effective in helping security analysts compete with ever-evolving threats.
Security Researcher, IBM X-Force