December 19, 2016 By Brad Harris 4 min read

This is the third and final installment in a series covering AI2 and machine learning. Be sure to read Part 1 for an introduction to AI2 and Part 2 for background on the algorithms used in the system.

Machine Learning, Human Teaching

The data set researchers Kalyan Veeramachaneni and Ignacio Arnaldo used to produce their paper, “AI2: Training a Big Data Machine to Defend,” is quite impressive. Experiments are too often based on data that is either unrepresentative of the real world or too brief to offer a realistic perspective. The authors claimed they evaluated their system on three months’ worth of enterprise platform logs with 3.6 billion log lines, working out to millions a day. This is far more representative of what we would see in real life.

It does raise one question, however: What environment did the data come from? Many of IBM’s customers see millions of attack incidents a day. In the paper, the authors claimed a value of less than 0.1 percent, putting the number of malicious attacks they detected in the thousands.

Class Imbalance

The authors detailed a dearth of malicious activity and the so-called “class imbalance” problem that arises when there are far more normal events than malicious events. While this is true even with large customers, there are many examples of malicious activity in large enterprises, especially at the border.

The paper explained the analysis of the ratio of normal to malicious users. It is somewhat unusual that the researchers chose to designate users as the unique entity in their analysis — typically, network attacks are measured by IP addresses. This approach is more reminiscent of the DARPA intrusion detection challenge, but on a much grander scale and not nearly as preprocessed. As with most anomaly detectors, there is noise in the normal data.

The Experiment

In section 8.1 of the paper, the authors outlined the types of attacks they look for. Note that, once again, their unique entities were users, which focused their attack types by necessity. This is more complicated, since they needed to watch for multistep behaviors. User-level attacks usually involve several actions, one after the other, that the system must spot. Also note that they also used IP addresses as a feature here, observing trends in attacks involving the number of IP addresses linked with the user entities.

First, they tried to identify account takeover attacks. This generally involves an attacker guessing the credentials of a user to access the system. Even more impressive, the researchers also searched for fraudulent account creation using a stolen credit card, which is extremely difficult to catch.

Lastly, the authors identified terms of service violations. This one is a bit more straightforward in a signature-based system, but it presents challenges in an anomaly detector. In a signature-based system, one can program a set of rules to determine what defines the terms of service. In an anomaly detector, the system must search for different behaviors from a normal user, which might represent a violation.

Hiding in the Noise

Many anomaly detectors are based purely on unsupervised algorithms, which have no access to clearly identified attacks verses normal labels. The algorithm never knows if it is “right” — it strictly evaluates based on what it sees most of the time, which it calls normal.

There are severe problems with this approach. By carefully introducing malicious traffic in a low and slow manner, the attacker can force a recalibration of what is considered normal. If this happens, the fraudster can then perform attacks with impunity. It is also possible for attackers to hide in the noise. For example, a command-and-control (C&C) protocol that uses typical Transport Layer Security (TLS) traffic may not be flagged as abnormal unless the infected computer does this often and too quickly.

The authors tested the use of labeled data from the past. This is a reasonable test, since the enterprise may have had logs that had already been filtered by their security operations center (SOC) analysts. This can happen if an enterprise wants to store the logs for trend analysis, for example. In this case, however, the enterprise may only keep attack data, leading to the flip side of the aforementioned class imbalance problem: There are more maliciously labeled examples than normal ones. The labeled data may also have noise in it, meaning there could be misidentified examples in the data.

The Results

As for the results, Figure 11 in the paper showed a graphical view of just how well the system did. Having historically labeled data definitely helps bootstrap the system. With no historical data, the system detected 143 of 318 total attacks. With historical data, it found 211. As the active model is continuously trained, the model will improve.

This demonstrates the importance of the domain expertise in the system’s feedback. Unlike many unsupervised anomaly detectors, the system gets better with time as long as there are experts to help teach it. The system is not meant to solve the problem by itself, but rather learn from the labeled examples provided by the SOC analysts. In fact, the authors claimed that at the end of the 12 weeks, the performance with and without historical data was the same.

Finally, the authors reported that the system with no historical data performed 3.41 times better than the unsupervised detector and reduced false positives fivefold. This means that analysts can focus on, say, 200 events per day instead of a thousand. This is quite an improvement in efficiency.

The technique shows very real promise and emphasizes the usefulness of domain knowledge in machine learning analysis. Machine learning can’t be the only tool in the arsenal — it needs human oversight to succeed.

More from Artificial Intelligence

Cloud Threat Landscape Report: AI-generated attacks low for the cloud

2 min read - For the last couple of years, a lot of attention has been placed on the evolutionary state of artificial intelligence (AI) technology and its impact on cybersecurity. In many industries, the risks associated with AI-generated attacks are still present and concerning, especially with the global average of data breach costs increasing by 10% from last year.However, according to the most recent Cloud Threat Landscape Report released by IBM’s X-Force team, the near-term threat of an AI-generated attack targeting cloud computing…

Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models

4 min read - With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook.With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data used to…

Security roundup: Top AI stories in 2024

3 min read - 2024 has been a banner year for artificial intelligence (AI). As enterprises ramp up adoption, however, malicious actors have been exploring new ways to compromise systems with intelligent attacks.With the AI landscape rapidly evolving, it's worth looking back before moving forward. Here are our top five AI security stories for 2024.Can you hear me now? Hackers hijack audio with AIAttackers can fake entire conversations using large language models (LLMs), voice cloning and speech-to-text software. This method is relatively easy to…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today