Machine learning has grown to be one of the most popular and powerful tools in the quest to secure systems. Some approaches to machine learning have yielded overly aggressive models that demonstrate remarkable predictive accuracy, yet give way to false positives. False positives create negative user experiences that prevent new protection from deploying. IT personnel also find these false alarms disruptive when they are working to detect and eliminate malware.

The Ponemon Institute recently reported that over 20 percent of endpoint security investigation spending was wasted on these false alarms. IBM’s Ronan Murphy and Martin Borrett also noted that one of Watson’s critical goals is to present security issues to researchers without “drowning them in false alarms.”

Why Are Some Machine Learning Approaches So Prone to False Positives?

Machine learning works to draw relationships between different elements of data. To provide endpoint security solutions, most models search for features that can provide the most context about malware threats. In other words, the models are trained to recognize good software and bad software in order to block the bad.

Many newer solutions on the market aim to identify a wider variety of malicious code than existing products to highlight the need for more protection. However, when models are trained with a bias toward identifying malware, they are more likely to lump good software in with the bad, and thus create false positives.

This imbalance becomes more pronounced due to how challenging it is to capture a representative sample of good software, particularly custom software. New tools have made it simpler and faster for organizations to create or combine more of their own applications, and many business applications are developed for a specific use at a specific firm. So, while gathering tens of thousands of malware samples is straightforward and represents threats common to all organizations, gathering a similar quantity of good software means acquiring information about well-known and packaged applications. This causes training models to recognize the differences between malware and common packaged software, yet ignore the profile of custom or lesser-known applications that may also be present.

Assessing the Business Impact

We’ve already talked about alert fatigue and the wasted investment of tracking down false positive results. These impacts, though, are mainly felt by the IT or security group. The real damage is caused by the effect on the individual users: When a preventative solution thinks it sees malicious code, it stops it from running. If there is a false positive, this means that users cannot run an application that they need for their job.

According to a Barkly survey of IT administrators, 42 percent of companies believe that their users lost productivity as a result of false positive results. This creates a choke point for IT and security administrators in the business life cycle. To manage false positives, companies should create new processes to minimize their duration and recurrence.

In some cases, the process of recognizing, repairing and avoiding false positives can take on a life of its own. In even a midsized organization, the volume of different software packages can run into the hundreds. If each package is only updated once a year, then every day could hold multiple new executables that could result in potential false positives. Companies will then have to allocate budgets for whitelisting or exception creation.

Designing a Better Approach

A critical component of modern machine learning is its ability to quickly gather insight from new data and adapt. Considering how biases lead to false positives, it is clear that models will need to be sensitive to the particular software profile of each organization.

In the same way that machine learning can be a groundbreaking technology to recognize new malware, it can also be used to train against a company’s newest software. The best body of good software to train with resides within the organization. By training against both the broadest samples of malware and the most relevant samples of good software, the models can deliver the best protection with the highest accuracy — and lowest false positive rate.

Achieving Balance

The Barkly survey revealed that IT professionals began to doubt the urgency of alerts once they saw more false positives than valid alerts. To provide maximum value while reducing the pressure on overworked staff, security based on machine learning must balance blocking malicious software with avoiding impact on the regular use of business applications. This requires a robust understanding of an organization’s good software, in addition to identifying and training on malicious software. In the end, the result is the true security value that thoughtful machine learning can bring.

Read the case study: Sogeti Realizes 50% Faster Analysis Times With Watson for Cyber Security


More from Artificial Intelligence

Generative AI security requires a solid framework

4 min read - How many companies intentionally refuse to use AI to get their work done faster and more efficiently? Probably none: the advantages of AI are too great to deny.The benefits AI models offer to organizations are undeniable, especially for optimizing critical operations and outputs. However, generative AI also comes with risk. According to the IBM Institute for Business Value, 96% of executives say adopting generative AI makes a security breach likely in their organization within the next three years.CISA Director Jen…

Self-replicating Morris II worm targets AI email assistants

4 min read - The proliferation of generative artificial intelligence (gen AI) email assistants such as OpenAI’s GPT-3 and Google’s Smart Compose has revolutionized communication workflows. Unfortunately, it has also introduced novel attack vectors for cyber criminals. Leveraging recent advancements in AI and natural language processing, malicious actors can exploit vulnerabilities in gen AI systems to orchestrate sophisticated cyberattacks with far-reaching consequences. Recent studies have uncovered the insidious capabilities of self-replicating malware, exemplified by the “Morris II” strain created by researchers. How the Morris…

Open source, open risks: The growing dangers of unregulated generative AI

3 min read - While mainstream generative AI models have built-in safety barriers, open-source alternatives have no such restrictions. Here’s what that means for cyber crime.There’s little doubt that open-source is the future of software. According to the 2024 State of Open Source Report, over two-thirds of businesses increased their use of open-source software in the last year.Generative AI is no exception. The number of developers contributing to open-source projects on GitHub and other platforms is soaring. Organizations are investing billions in generative AI…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today