Threat Intelligence Machine Learning Adoption: Time to Ditch the Black Box Security Analytics

Cyber threat intelligence (CTI) is being integrated by 81 percent of enterprises in 2018, according to a February 2018 SANS Institute survey. As organizations invest in the use of real-time event data for detection and response, sophisticated threat intelligence machine learning adoption has the potential to improve visibility into unknown risks and strengthen the stance of the cognitive security operations center (SOC). Achieving data science maturity within the context of cybersecurity means empowering the right person with the right intelligence to take action while minimizing false positives.

The majority of security professionals are already using machine learning-based tools for security operations. A December 2017 Webroot study found 88 percent of cybersecurity programs have some AI-based solutions (most commonly for malware detection), malicious IP detection and website classification. But 69 percent of respondents felt these solutions aren’t fully reliable, and 91 percent expressed intent to increase their investment in AI-based security solutions over the next three years.

“Think like the adversary [and] reserve capacity,” said David Hogue, senior technical director at the National Security Agency in his RSA Conference 2018 session. “Use data science and machine learning to reduce SOC alert fatigue.” Machine learning threat intelligence adoption is on the rise — and with good reason — but not without raising some concerns.

CISOs Worry About Weaponized Machine Learning

Weaponized AI was positioned among the top cybersecurity risks “to really worry about” at the cusp of 2018 by tech futurist Martin Giles in MIT Technology Review. Similarly, 62 percent of chief information security officers (CISOs) surveyed by Cylance at Black Hat 2017 believe AI is a “double-edged sword,” and predicted that hackers would recognize this opportunity within the year.

It takes just three minutes to “brute force” any password with seven characters or fewer. One 2016 Columbia University study demonstrated 98 percent accuracy in “breaking” the latest iteration of the Google reCaptcha service.

Other known nefarious applications of machine learning and analytics include:

  • Automated voice fabrication
  • AI malware creation
  • Neural networks for gathering open-source intelligence
  • Automated botnet creation
  • Swarming threat intelligence systems with false positives
  • Corrupting enterprise machine learning models

Just as importantly, based on last year’s ransomware-as-a-service (RaaS) epidemic, it’s probably only a matter of time before machine learning tools for cybercrime are sold as a service on the darknet. Machine learning has value within the context of mature enterprise CTI, but it also may be crucial for playing defense against highly efficient algorithm-armed attackers.

How to Strengthen Machine Learning Threat Intelligence

The majority of organizations with a formal approach to CTI recognize room for improvement with their data science initiatives. According to the SANS Institute, 55.91 percent are “not satisfied” with the maturity of their machine learning. The same SANS Institute survey found general barriers to CTI maturity that likely prevent mature machine learning adoption, including a “lack of trained and experienced staff, budget, lack of time and lack of technical ability to integrate CTI.”

In addition to these barriers, the requirements for a strengthened machine learning stance can include the need for real-time integration of third-party event intelligence, the need for humans to work closely with models and a need for transparency.

1. Data Intelligence, Identification and Processing

Machine learning can enable the SOC to move beyond blacklisting and automation to threat hunting through the dynamic recognition of threats based on the rich, contextual recognition of factors like maliciousness ratio. While an experienced human analyst can recognize dynamically evolving threats based on gut feeling, machine learning models win the contest of scale in a world where a unique strain of malware is identified every several seconds.

Training the machine learning models with enough data to recognize dynamically evolving threat events in real time carries several requirements, including the use of both internal network data and third-party threat exchange data for total visibility. The SOC’s analytics engine also needs the ability to identify, integrate and adapt to the changing events landscape in real time to provide analysts with actionable recommendations.

2. Human Supervision and Oversight

While AI is, in general, incredibly effective at identifying patterns from unstructured data sets (e.g., an intelligence source event), machine learning models are not infallible. Models must be trained to provide value. Results must be analyzed and researched.

The security analyst’s feedback on whether a model’s positives result in action, further investigation or no response will be necessary to refine machine learning models. Expert feedback on a model’s analysis is also needed to provide collaborative benefit to the threat exchange. Make room for the resulting period of transition.

3. Transparency and Trust in Machine Learning Models

For a machine learning model to provide value to a security program, it can’t operate like a black box. The “black box” is a decades-old computing metaphor for an equation where you can see the inputs and outputs but can’t actually understand how the problem functions.

“AI and humans are best when they work together and can trust each other,” said Rob High, CTO of IBM Watson, in Forbes. This statement applies to threat intelligence machine learning models — regardless of whether they’re developed in-house. Threat analysts need to understand how a model arrived at a decision to trust its recommendations and take the right response.

“Importantly, [machine learning] models are not all created equally, and each may require a different technique to describe decision-making,” wrote data science researcher Hyrum Anderson, while a “nearest-neighbor classifier naturally justifies its decision using case-based reasoning.”

Creating a Culture of Innovative Machine Learning Security

“Cyber activity continues to become more sophisticated,” said Hogue. He refers to the “frequency of aggressive [and] escalatory cyber behavior.”

Hogue called for RSA Conference attendees to develop new, innovative approaches to safeguarding the enterprise SOC:

  • Maintaining stronger, centralized policies
  • Adopting artificial intelligence and machine learning
  • Deepening collaboration and community expertise
  • Engaging in diverse recruitment practices

In 2018, the adoption of machine learning tools and weaponized AI by the hacking community is one of the greatest information security threats facing the enterprise. In the months to come, AI-powered threats could become as pervasive as ransomware.

As organizations strengthen their CTI programs, machine learning adoption should play a significant role in threat identification, threat hunting and response programs. In the cognitive security operations center of the future, smarter algorithms can strengthen defenses against AI-powered threats.

Learn about adversarial AI and the IBM Adversarial Robustness Toolbox (ART)

Machine learning threat intelligence adoption is on the rise — and with good reason — but not without raising some concerns.
Share this Article:

Jasmine Henry (formerly Jasmine W. Gordon) is a Seattle-based emerging commentator and freelance journalist specializing in analytics, information security, and other emerging tech trends. Her work has appeared in Forbes, Time, Reuters, HR Professionals Magazine, ADP Spark, HP Tektonika, Mimecast, and dozens of other publications. With a background in IT project management and data analysis, Henry has direct experience with the application of emergent technology trends in both startup and enterprise settings. She specializes in translating complex trends and massive data sets into high-value, shareable news and blog posts. Henry holds a MS in Informatics & Analytics from Lipscomb University in Nashville, where she also completed a graduate certificate in Health Care Informatics.