December 19, 2016 By Brad Harris 4 min read

This is the third and final installment in a series covering AI2 and machine learning. Be sure to read Part 1 for an introduction to AI2 and Part 2 for background on the algorithms used in the system.

Machine Learning, Human Teaching

The data set researchers Kalyan Veeramachaneni and Ignacio Arnaldo used to produce their paper, “AI2: Training a Big Data Machine to Defend,” is quite impressive. Experiments are too often based on data that is either unrepresentative of the real world or too brief to offer a realistic perspective. The authors claimed they evaluated their system on three months’ worth of enterprise platform logs with 3.6 billion log lines, working out to millions a day. This is far more representative of what we would see in real life.

It does raise one question, however: What environment did the data come from? Many of IBM’s customers see millions of attack incidents a day. In the paper, the authors claimed a value of less than 0.1 percent, putting the number of malicious attacks they detected in the thousands.

Class Imbalance

The authors detailed a dearth of malicious activity and the so-called “class imbalance” problem that arises when there are far more normal events than malicious events. While this is true even with large customers, there are many examples of malicious activity in large enterprises, especially at the border.

The paper explained the analysis of the ratio of normal to malicious users. It is somewhat unusual that the researchers chose to designate users as the unique entity in their analysis — typically, network attacks are measured by IP addresses. This approach is more reminiscent of the DARPA intrusion detection challenge, but on a much grander scale and not nearly as preprocessed. As with most anomaly detectors, there is noise in the normal data.

The Experiment

In section 8.1 of the paper, the authors outlined the types of attacks they look for. Note that, once again, their unique entities were users, which focused their attack types by necessity. This is more complicated, since they needed to watch for multistep behaviors. User-level attacks usually involve several actions, one after the other, that the system must spot. Also note that they also used IP addresses as a feature here, observing trends in attacks involving the number of IP addresses linked with the user entities.

First, they tried to identify account takeover attacks. This generally involves an attacker guessing the credentials of a user to access the system. Even more impressive, the researchers also searched for fraudulent account creation using a stolen credit card, which is extremely difficult to catch.

Lastly, the authors identified terms of service violations. This one is a bit more straightforward in a signature-based system, but it presents challenges in an anomaly detector. In a signature-based system, one can program a set of rules to determine what defines the terms of service. In an anomaly detector, the system must search for different behaviors from a normal user, which might represent a violation.

Hiding in the Noise

Many anomaly detectors are based purely on unsupervised algorithms, which have no access to clearly identified attacks verses normal labels. The algorithm never knows if it is “right” — it strictly evaluates based on what it sees most of the time, which it calls normal.

There are severe problems with this approach. By carefully introducing malicious traffic in a low and slow manner, the attacker can force a recalibration of what is considered normal. If this happens, the fraudster can then perform attacks with impunity. It is also possible for attackers to hide in the noise. For example, a command-and-control (C&C) protocol that uses typical Transport Layer Security (TLS) traffic may not be flagged as abnormal unless the infected computer does this often and too quickly.

The authors tested the use of labeled data from the past. This is a reasonable test, since the enterprise may have had logs that had already been filtered by their security operations center (SOC) analysts. This can happen if an enterprise wants to store the logs for trend analysis, for example. In this case, however, the enterprise may only keep attack data, leading to the flip side of the aforementioned class imbalance problem: There are more maliciously labeled examples than normal ones. The labeled data may also have noise in it, meaning there could be misidentified examples in the data.

The Results

As for the results, Figure 11 in the paper showed a graphical view of just how well the system did. Having historically labeled data definitely helps bootstrap the system. With no historical data, the system detected 143 of 318 total attacks. With historical data, it found 211. As the active model is continuously trained, the model will improve.

This demonstrates the importance of the domain expertise in the system’s feedback. Unlike many unsupervised anomaly detectors, the system gets better with time as long as there are experts to help teach it. The system is not meant to solve the problem by itself, but rather learn from the labeled examples provided by the SOC analysts. In fact, the authors claimed that at the end of the 12 weeks, the performance with and without historical data was the same.

Finally, the authors reported that the system with no historical data performed 3.41 times better than the unsupervised detector and reduced false positives fivefold. This means that analysts can focus on, say, 200 events per day instead of a thousand. This is quite an improvement in efficiency.

The technique shows very real promise and emphasizes the usefulness of domain knowledge in machine learning analysis. Machine learning can’t be the only tool in the arsenal — it needs human oversight to succeed.

More from Artificial Intelligence

NIST’s role in the global tech race against AI

4 min read - Last year, the United States Secretary of Commerce announced that the National Institute of Standards and Technology (NIST) has been put in charge of launching a new public working group on artificial intelligence (AI) that will build on the success of the NIST AI Risk Management Framework to address this rapidly advancing technology.However, recent budget cuts at NIST, along with a lack of strategy implementation, have called into question the agency’s ability to lead this critical effort. Ultimately, the success…

Researchers develop malicious AI ‘worm’ targeting generative AI systems

2 min read - Researchers have created a new, never-seen-before kind of malware they call the "Morris II" worm, which uses popular AI services to spread itself, infect new systems and steal data. The name references the original Morris computer worm that wreaked havoc on the internet in 1988.The worm demonstrates the potential dangers of AI security threats and creates a new urgency around securing AI models.New worm utilizes adversarial self-replicating promptThe researchers from Cornell Tech, the Israel Institute of Technology and Intuit, used what’s…

What should an AI ethics governance framework look like?

4 min read - While the race to achieve generative AI intensifies, the ethical debate surrounding the technology also continues to heat up. And the stakes keep getting higher.As per Gartner, “Organizations are responsible for ensuring that AI projects they develop, deploy or use do not have negative ethical consequences.” Meanwhile, 79% of executives say AI ethics is important to their enterprise-wide AI approach, but less than 25% have operationalized ethics governance principles.AI is also high on the list of United States government concerns.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today