Machine learning has grown to be one of the most popular and powerful tools in the quest to secure systems. Some approaches to machine learning have yielded overly aggressive models that demonstrate remarkable predictive accuracy, yet give way to false positives. False positives create negative user experiences that prevent new protection from deploying. IT personnel also find these false alarms disruptive when they are working to detect and eliminate malware.

The Ponemon Institute recently reported that over 20 percent of endpoint security investigation spending was wasted on these false alarms. IBM’s Ronan Murphy and Martin Borrett also noted that one of Watson’s critical goals is to present security issues to researchers without “drowning them in false alarms.”

Why Are Some Machine Learning Approaches So Prone to False Positives?

Machine learning works to draw relationships between different elements of data. To provide endpoint security solutions, most models search for features that can provide the most context about malware threats. In other words, the models are trained to recognize good software and bad software in order to block the bad.

Many newer solutions on the market aim to identify a wider variety of malicious code than existing products to highlight the need for more protection. However, when models are trained with a bias toward identifying malware, they are more likely to lump good software in with the bad, and thus create false positives.

This imbalance becomes more pronounced due to how challenging it is to capture a representative sample of good software, particularly custom software. New tools have made it simpler and faster for organizations to create or combine more of their own applications, and many business applications are developed for a specific use at a specific firm. So, while gathering tens of thousands of malware samples is straightforward and represents threats common to all organizations, gathering a similar quantity of good software means acquiring information about well-known and packaged applications. This causes training models to recognize the differences between malware and common packaged software, yet ignore the profile of custom or lesser-known applications that may also be present.

Assessing the Business Impact

We’ve already talked about alert fatigue and the wasted investment of tracking down false positive results. These impacts, though, are mainly felt by the IT or security group. The real damage is caused by the effect on the individual users: When a preventative solution thinks it sees malicious code, it stops it from running. If there is a false positive, this means that users cannot run an application that they need for their job.

According to a Barkly survey of IT administrators, 42 percent of companies believe that their users lost productivity as a result of false positive results. This creates a choke point for IT and security administrators in the business life cycle. To manage false positives, companies should create new processes to minimize their duration and recurrence.

In some cases, the process of recognizing, repairing and avoiding false positives can take on a life of its own. In even a midsized organization, the volume of different software packages can run into the hundreds. If each package is only updated once a year, then every day could hold multiple new executables that could result in potential false positives. Companies will then have to allocate budgets for whitelisting or exception creation.

Designing a Better Approach

A critical component of modern machine learning is its ability to quickly gather insight from new data and adapt. Considering how biases lead to false positives, it is clear that models will need to be sensitive to the particular software profile of each organization.

In the same way that machine learning can be a groundbreaking technology to recognize new malware, it can also be used to train against a company’s newest software. The best body of good software to train with resides within the organization. By training against both the broadest samples of malware and the most relevant samples of good software, the models can deliver the best protection with the highest accuracy — and lowest false positive rate.

Achieving Balance

The Barkly survey revealed that IT professionals began to doubt the urgency of alerts once they saw more false positives than valid alerts. To provide maximum value while reducing the pressure on overworked staff, security based on machine learning must balance blocking malicious software with avoiding impact on the regular use of business applications. This requires a robust understanding of an organization’s good software, in addition to identifying and training on malicious software. In the end, the result is the true security value that thoughtful machine learning can bring.

Read the case study: Sogeti Realizes 50% Faster Analysis Times With Watson for Cyber Security


More from Intelligence & Analytics

The 13 Costliest Cyberattacks of 2022: Looking Back

2022 has shaped up to be a pricey year for victims of cyberattacks. Cyberattacks continue to target critical infrastructures such as health systems, small government agencies and educational institutions. Ransomware remains a popular attack method for large and small targets alike. While organizations may choose not to disclose the costs associated with a cyberattack, the loss of consumer trust will always be a risk after any significant attack. Let’s look at the 13 costliest cyberattacks of the past year and…

What Can We Learn From Recent Cyber History?

The Center for Strategic and International Studies compiled a list of significant cyber incidents dating back to 2003. Compiling attacks on government agencies, defense and high-tech companies or economic crimes with losses of more than a million dollars, this list reveals broader trends in cybersecurity for the past two decades. And, of course, there are the headline breaches and supply chain attacks to consider. Over recent years, what lessons can we learn from our recent history — and what projections…

When Logs Are Out, Enhanced Analytics Stay In

I was talking to an analyst firm the other day. They told me that a lot of organizations purchase a security information and event management (SIEM) solution and then “place it on the shelf.” “Why would they do that?” I asked. I spent the majority of my career in hardware — enterprise hardware, cloud hardware, and just recently made the jump to security software, hence my question. “Because SIEMs are hard to use. A SIEM purchase is just a checked…

4 Most Common Cyberattack Patterns from 2022

As 2022 comes to an end, cybersecurity teams globally are taking the opportunity to reflect on the past 12 months and draw whatever conclusions and insights they can about the threat landscape. It has been a challenging year for security teams. A major conflict in Europe, a persistently remote workforce and a series of large-scale cyberattacks have all but guaranteed that 2022 was far from uneventful. In this article, we’ll round up some of the most common cyberattack patterns we…