Machine learning has grown to be one of the most popular and powerful tools in the quest to secure systems. Some approaches to machine learning have yielded overly aggressive models that demonstrate remarkable predictive accuracy, yet give way to false positives. False positives create negative user experiences that prevent new protection from deploying. IT personnel also find these false alarms disruptive when they are working to detect and eliminate malware.

The Ponemon Institute recently reported that over 20 percent of endpoint security investigation spending was wasted on these false alarms. IBM’s Ronan Murphy and Martin Borrett also noted that one of Watson’s critical goals is to present security issues to researchers without “drowning them in false alarms.”

Why Are Some Machine Learning Approaches So Prone to False Positives?

Machine learning works to draw relationships between different elements of data. To provide endpoint security solutions, most models search for features that can provide the most context about malware threats. In other words, the models are trained to recognize good software and bad software in order to block the bad.

Many newer solutions on the market aim to identify a wider variety of malicious code than existing products to highlight the need for more protection. However, when models are trained with a bias toward identifying malware, they are more likely to lump good software in with the bad, and thus create false positives.

This imbalance becomes more pronounced due to how challenging it is to capture a representative sample of good software, particularly custom software. New tools have made it simpler and faster for organizations to create or combine more of their own applications, and many business applications are developed for a specific use at a specific firm. So, while gathering tens of thousands of malware samples is straightforward and represents threats common to all organizations, gathering a similar quantity of good software means acquiring information about well-known and packaged applications. This causes training models to recognize the differences between malware and common packaged software, yet ignore the profile of custom or lesser-known applications that may also be present.

Assessing the Business Impact

We’ve already talked about alert fatigue and the wasted investment of tracking down false positive results. These impacts, though, are mainly felt by the IT or security group. The real damage is caused by the effect on the individual users: When a preventative solution thinks it sees malicious code, it stops it from running. If there is a false positive, this means that users cannot run an application that they need for their job.

According to a Barkly survey of IT administrators, 42 percent of companies believe that their users lost productivity as a result of false positive results. This creates a choke point for IT and security administrators in the business life cycle. To manage false positives, companies should create new processes to minimize their duration and recurrence.

In some cases, the process of recognizing, repairing and avoiding false positives can take on a life of its own. In even a midsized organization, the volume of different software packages can run into the hundreds. If each package is only updated once a year, then every day could hold multiple new executables that could result in potential false positives. Companies will then have to allocate budgets for whitelisting or exception creation.

Designing a Better Approach

A critical component of modern machine learning is its ability to quickly gather insight from new data and adapt. Considering how biases lead to false positives, it is clear that models will need to be sensitive to the particular software profile of each organization.

In the same way that machine learning can be a groundbreaking technology to recognize new malware, it can also be used to train against a company’s newest software. The best body of good software to train with resides within the organization. By training against both the broadest samples of malware and the most relevant samples of good software, the models can deliver the best protection with the highest accuracy — and lowest false positive rate.

Achieving Balance

The Barkly survey revealed that IT professionals began to doubt the urgency of alerts once they saw more false positives than valid alerts. To provide maximum value while reducing the pressure on overworked staff, security based on machine learning must balance blocking malicious software with avoiding impact on the regular use of business applications. This requires a robust understanding of an organization’s good software, in addition to identifying and training on malicious software. In the end, the result is the true security value that thoughtful machine learning can bring.

Read the case study: Sogeti Realizes 50% Faster Analysis Times With Watson for Cyber Security

 

More from Artificial Intelligence

Autonomous security for cloud in AWS: Harnessing the power of AI for a secure future

3 min read - As the digital world evolves, businesses increasingly rely on cloud solutions to store data, run operations and manage applications. However, with this growth comes the challenge of ensuring that cloud environments remain secure and compliant with ever-changing regulations. This is where the idea of autonomous security for cloud (ASC) comes into play.Security and compliance aren't just technical buzzwords; they are crucial for businesses of all sizes. With data breaches and cyber threats on the rise, having systems that ensure your…

Cybersecurity Awareness Month: 5 new AI skills cyber pros need

4 min read - The rapid integration of artificial intelligence (AI) across industries, including cybersecurity, has sparked a sense of urgency among professionals. As organizations increasingly adopt AI tools to bolster security defenses, cyber professionals now face a pivotal question: What new skills do I need to stay relevant?October is Cybersecurity Awareness Month, which makes it the perfect time to address this pressing issue. With AI transforming threat detection, prevention and response, what better moment to explore the essential skills professionals might require?Whether you're…

3 proven use cases for AI in preventative cybersecurity

3 min read - IBM’s Cost of a Data Breach Report 2024 highlights a ground-breaking finding: The application of AI-powered automation in prevention has saved organizations an average of $2.2 million.Enterprises have been using AI for years in detection, investigation and response. However, as attack surfaces expand, security leaders must adopt a more proactive stance.Here are three ways how AI is helping to make that possible:1. Attack surface management: Proactive defense with AIIncreased complexity and interconnectedness are a growing headache for security teams, and…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today