The chief information security officer (CISO) faces threats such as compromised users, negligent employees and malicious insiders. For this reason, one of the most important tools in the CISO’s arsenal is user behavior analytics (UBA), a solution that scans data from a security information and event management (SIEM) system, correlates it by user and builds a serialized timeline.
How UBA Works
Machine learning models build baselines of normal behavior for each user by looking at historical activity and comparing it to peer groups. Any abnormal events detected are aggregated through a scoring mechanism that generates a combined risk score for each user. Alerts from other security tools can be used in this process as well.
Users at high risk are flagged with information such as job title, department, manager and group membership to enable analysts to quickly investigate that particular user’s behavior in the context of his or her role within the organization. By combining all of a user’s data from disparate systems and utilizing artificial intelligence (AI) to gain insights, UBA empowers analysts with new threat hunting capabilities.
This technology is not new, but its application is new in the security environment. Many endpoint products offered today are cloud-based to provide seamless mobile device protection outside the organization. Given the evolving attack landscape and the new challenges faced by security teams, the application is growing rapidly, and it is quickly becoming the best practice for enterprise security teams.
Machine learning technology uses techniques that harness AI to learn and make judgments without being programmed explicitly for every scenario. It is different from static, signature-based products such as SIEM because it learns from data. The technology is capable of providing a probabilistic conclusion, which can then be converted into a binary signal. The likelihood of a decision being accurate can be interpreted as a measure of confidence in that conclusion. Security analysts can also validate these conclusions and investigate others that fall into gray areas.
The mathematic algorithms are complex and computer resource-intensive. Since there is no single model that applies to every attack technique, the selection of the model and data is crucial. This is one reason why these new, evolving endpoint products are based in the cloud and conceivably draw upon data globally from every industry.
Establishing a Behavioral Baseline
Among the advantages of this technology is the ability to quickly and easily distinguish anomalous events from malicious events. Employees change jobs, locations and work habits all the time. Machine learning alleviates the overwhelming volume of false positives and provides the behavioral baseline DNA of each user.
Machine learning also enables analysts to interpret subtle signals. Behavioral analytics can flag most attacks that pace themselves and act in small steps, but attackers know that analysts have tools to find telltale attack signatures. For instance, SIEM correlation rules that look for the signature attack behavior can be easily bypassed by signature deviation. A correlation rule may look for five failed logins in one minute as an indicator of an abnormal access attempt. An attacker could bypass the rule by deviating the attempt one second after a minute elapsed.
Finally, analysts can use machine learning to gain insights beyond individual events. Cyberattacks that have already infiltrated the network might slowly follow the kill chain of reconnaissance, infiltration, spread and detonation. AI pieces together the whole picture to make decisions and aid in incident response.
Evaluating Machine Learning Solutions
There is a lot of marketing noise associated with machine learning technology. Below are some useful approaches to evaluating AI-enabled security solutions.
- Use case definitions: Determine what you want out of the solution and tailor it toward specifics such as spear phishing attacks, privileged users, malware, etc. This will help formulate a short list of solutions you’re targeting.
- Pick organizational subsets: Scaling is often a consideration, but for a proof of concept (PoC), consider establishing a small group to evaluate two or three vendors.
- Get source access: These solutions will need access to certain infrastructure, such as active directory log files, to operate. Ensure that the solution has all the appropriate access privileges it needs to function.
- Understand the results: Machine learning solutions deliver probabilistic results based on a percentage. The solution must provide supporting evidence when it flags an event so that analysts can act on it.
- Ensure classification accuracy: Evaluate the number of correct predictions as a ratio of all predictions made. This is the most common metric for classification problems — and also the most misused.
- Evaluate logarithmic loss: Logarithmic loss is defined as a performance metric for evaluating the predictions of probabilities of membership to a given class. It can be a measure of confidence for a prediction by an algorithm, for example. Predictions that are correct or incorrect are flagged to the confidence of the prediction.
- Determine who will own it: Common considerations include whether the tool will be a standalone solution or integrated with an SIEM. It can also be part of a security operations center (SOC) with red and blue teams harnessing it or another layer in the architecture where resources are tight.
Augmenting Human Intelligence
Always remember that these technologies are not silver bullets. Buyers of enterprise security products need to educate themselves on the basics of these technologies to avoid succumbing to the hype. Two standard deviations from the mean do not constitute machine learning, and five failed logins in one minute do not constitute artificial intelligence. In the absence of other information, there is no predictive value in seeing, for example, that an employee visited a website based in Russia.
These solutions provide a probability that a certain conclusion is accurate depending on its algorithm model. The real outcome is somewhere in the middle. Despite the hype surrounding artificial intelligence, all it does is provide mathematical suspicions, not confirmations. To maximize the effectiveness of artificial intelligence for cybersecurity, machine learning must be paired with savvy security analysts.