Machine learning has grown to be one of the most popular and powerful tools in the quest to secure systems. Some approaches to machine learning have yielded overly aggressive models that demonstrate remarkable predictive accuracy, yet give way to false positives. False positives create negative user experiences that prevent new protection from deploying. IT personnel also find these false alarms disruptive when they are working to detect and eliminate malware.

The Ponemon Institute recently reported that over 20 percent of endpoint security investigation spending was wasted on these false alarms. IBM’s Ronan Murphy and Martin Borrett also noted that one of Watson’s critical goals is to present security issues to researchers without “drowning them in false alarms.”

Why Are Some Machine Learning Approaches So Prone to False Positives?

Machine learning works to draw relationships between different elements of data. To provide endpoint security solutions, most models search for features that can provide the most context about malware threats. In other words, the models are trained to recognize good software and bad software in order to block the bad.

Many newer solutions on the market aim to identify a wider variety of malicious code than existing products to highlight the need for more protection. However, when models are trained with a bias toward identifying malware, they are more likely to lump good software in with the bad, and thus create false positives.

This imbalance becomes more pronounced due to how challenging it is to capture a representative sample of good software, particularly custom software. New tools have made it simpler and faster for organizations to create or combine more of their own applications, and many business applications are developed for a specific use at a specific firm. So, while gathering tens of thousands of malware samples is straightforward and represents threats common to all organizations, gathering a similar quantity of good software means acquiring information about well-known and packaged applications. This causes training models to recognize the differences between malware and common packaged software, yet ignore the profile of custom or lesser-known applications that may also be present.

Assessing the Business Impact

We’ve already talked about alert fatigue and the wasted investment of tracking down false positive results. These impacts, though, are mainly felt by the IT or security group. The real damage is caused by the effect on the individual users: When a preventative solution thinks it sees malicious code, it stops it from running. If there is a false positive, this means that users cannot run an application that they need for their job.

According to a Barkly survey of IT administrators, 42 percent of companies believe that their users lost productivity as a result of false positive results. This creates a choke point for IT and security administrators in the business life cycle. To manage false positives, companies should create new processes to minimize their duration and recurrence.

In some cases, the process of recognizing, repairing and avoiding false positives can take on a life of its own. In even a midsized organization, the volume of different software packages can run into the hundreds. If each package is only updated once a year, then every day could hold multiple new executables that could result in potential false positives. Companies will then have to allocate budgets for whitelisting or exception creation.

Designing a Better Approach

A critical component of modern machine learning is its ability to quickly gather insight from new data and adapt. Considering how biases lead to false positives, it is clear that models will need to be sensitive to the particular software profile of each organization.

In the same way that machine learning can be a groundbreaking technology to recognize new malware, it can also be used to train against a company’s newest software. The best body of good software to train with resides within the organization. By training against both the broadest samples of malware and the most relevant samples of good software, the models can deliver the best protection with the highest accuracy — and lowest false positive rate.

Achieving Balance

The Barkly survey revealed that IT professionals began to doubt the urgency of alerts once they saw more false positives than valid alerts. To provide maximum value while reducing the pressure on overworked staff, security based on machine learning must balance blocking malicious software with avoiding impact on the regular use of business applications. This requires a robust understanding of an organization’s good software, in addition to identifying and training on malicious software. In the end, the result is the true security value that thoughtful machine learning can bring.

Read the case study: Sogeti Realizes 50% Faster Analysis Times With Watson for Cyber Security

 

More from Artificial Intelligence

Cloud Threat Landscape Report: AI-generated attacks low for the cloud

2 min read - For the last couple of years, a lot of attention has been placed on the evolutionary state of artificial intelligence (AI) technology and its impact on cybersecurity. In many industries, the risks associated with AI-generated attacks are still present and concerning, especially with the global average of data breach costs increasing by 10% from last year.However, according to the most recent Cloud Threat Landscape Report released by IBM’s X-Force team, the near-term threat of an AI-generated attack targeting cloud computing…

Testing the limits of generative AI: How red teaming exposes vulnerabilities in AI models

4 min read - With generative artificial intelligence (gen AI) on the frontlines of information security, red teams play an essential role in identifying vulnerabilities that others can overlook.With the average cost of a data breach reaching an all-time high of $4.88 million in 2024, businesses need to know exactly where their vulnerabilities lie. Given the remarkable pace at which they’re adopting gen AI, there’s a good chance that some of those vulnerabilities lie in AI models themselves — or the data used to…

Security roundup: Top AI stories in 2024

3 min read - 2024 has been a banner year for artificial intelligence (AI). As enterprises ramp up adoption, however, malicious actors have been exploring new ways to compromise systems with intelligent attacks.With the AI landscape rapidly evolving, it's worth looking back before moving forward. Here are our top five AI security stories for 2024.Can you hear me now? Hackers hijack audio with AIAttackers can fake entire conversations using large language models (LLMs), voice cloning and speech-to-text software. This method is relatively easy to…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today