It’s difficult to talk about security analytics without considering machine learning. Machine learning is used to detect malicious websites, flow anomalies, infectious files, infected endpoints and user behavior anomalies. It’s applied to big data repositories to glean information and insights that may otherwise go undetected.

Multiple industries are using machine learning to better automate security screening, border entry, college applicant selection, loan analytics and health care. Behind the scenes, almost every industry that affects our daily lives involves some type of machine learning.

Training the System

Machine learning is based upon statistical analytics of existing data and learning applied to new data sets. In the case of college applicants, admission analysts train the system by feeding transcripts, financial information, demographic information, high school information, SAT scores and any data that may seem relevant for accepting an applicant. In the case of network security, security analysts train the system by examining web browsing tendencies, entry/exit data, email tendencies, login authentication data and any other available user behavioral analytics.

The goal is to identify and classify anomalous situations that serve to train the system. This sounds great doesn’t it? Not so fast — in its infancy, machine learning can produce errant results.

Machine Learning Flunks Its First Tests

For example, my son was recently denied a home mortgage loan. He has good credit, a steady job and met the minimal standards for a down payment. He was simply denied the application with little explanation other than the computer had determined he was a high risk. After weeks of significant pressure on the bank, we discovered that his job was classified as a high-risk occupation for long-term employment.

One of my colleagues was also recently identified as a high-risk user based upon his browsing habits and voice-over-IP (VoIP) usage. Full disclosure identified his geolocated communications as the culprit, coupled with his web browsing habits. The web browsing habits could not be identified or fully disclosed since they were simply “anomalous.” I suspect it was because he communicated with his homeland family often and the geolocation was from a country associated with cybercrime.

Accuracy and Classification

Machine learning makes complex statistical decisions on data based solely on the accuracy of classification. It recursively quantifies and correlates millions of potential decision trees until it has the most accurate classification. In human terms, it does not understand why these decisions make sense, only what are the most accurate decisions based upon the classifications. This is a real problem.

The above diagram is a very simple decision tree derived using machine learning classifiers. The ellipses are different data sets used by the classifiers. If this were a classifier for loan applications, would it make sense to a human? Machine learning makes decisions based on best-guess algorithms, but more importantly, it makes decisions that have no apparent human explanation.

In fact, the main benefit of machine learning — the ability to make decisions that are not humanly evident — is also its potential danger. Imagine machine learning inaccurately identifying a malicious website and then blocking access to that website. The owner wants an explanation and remediation, but the classification cannot explain it.

Human Touch

Machine learning is gradually touching every part of our lives and making decisions of which we may not be fully aware. There is a significant need to disclose both the underlying data and the classification schemes of these processes. IBM is currently working with machine learning analytics to determine domain or website maliciousness. With this comes the ethical responsibility to disclose the information and decision analytics that determine benign or malicious intent. If we deny access to a website, we must then provide the human explanation and core data that drove this action.

In security, this level of responsibility also has a legal tangent. What is the damage to the website owner if access is inappropriately denied? IBM is building both traceability and disclosure into our Domain Name System (DNS) security analytics and believes it will be a significant differentiator. It also carries interesting side effects that involve human interaction to reclassify incorrect data: Explain it in human terms and then allow someone to educate the classifier with new data. Maybe we add a “like” or “don’t like” button for misclassified data.

IBM prides itself on business ethics as one of its core foundations to help build trust with consumers. Therefore, we’re building transparency into our machine learning analytics and striving to be right more than we’re wrong. I would encourage the decision-makers of machine learning products to challenge the transparency of the offering and demand humanly interpretable audits of outcomes.

Learn more about cognitive security

More from Intelligence & Analytics

New report shows ongoing gender pay gap in cybersecurity

3 min read - The gender gap in cybersecurity isn’t a new issue. The lack of women in cybersecurity and IT has been making headlines for years — even decades. While progress has been made, there is still significant work to do, especially regarding salary.The recent  ISC2 Cybersecurity Workforce Study highlighted numerous cybersecurity issues regarding women in the field. In fact, only 17% of the 14,865 respondents to the survey were women.Pay gap between men and womenOne of the most concerning disparities revealed by…

Protecting your data and environment from unknown external risks

3 min read - Cybersecurity professionals always keep their eye out for trends and patterns to stay one step ahead of cyber criminals. The IBM X-Force does the same when working with customers. Over the past few years, clients have often asked the team about threats outside their internal environment, such as data leakage, brand impersonation, stolen credentials and phishing sites. To help customers overcome these often unknown and unexpected risks that are often outside of their control, the team created Cyber Exposure Insights…

X-Force Threat Intelligence Index 2024 reveals stolen credentials as top risk, with AI attacks on the horizon

4 min read - Every year, IBM X-Force analysts assess the data collected across all our security disciplines to create the IBM X-Force Threat Intelligence Index, our annual report that plots changes in the cyber threat landscape to reveal trends and help clients proactively put security measures in place. Among the many noteworthy findings in the 2024 edition of the X-Force report, three major trends stand out that we’re advising security professionals and CISOs to observe: A sharp increase in abuse of valid accounts…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today