What’s it like to measure and optimize global security operations centers (SOCs)? Measuring a SOC is a bit like measuring the operating the performance of a machine or a factory. It’s incredibly important to monitor and measure the performance of every component and how it all works together. We do this to ensure that there’s an end-to-end, streamlined and seamless security workflow. The right tools and supporting technologies can enable the security workflows to operate efficiently and provide the quality and speed that the service requires. We ultimately need these to deliver the right outcomes that the client expects.

There are also a handful of variables to measurement that have to be closely monitored in a SOC. There’s availability, speed, accuracy, depth of investigation and quality of investigation. Within the environment, you will also have a number of measures around capacity and performance so that you can ensure that your operating that service within the lens of quality and speed.

All of these need to be considered when you’re measuring an intense global 24×7 security service that’s delivering the highest value to the clients in a narrow operating window. Drawing on my experiences with IBM Managed Security Services, here are crucial metrics for measuring effectiveness across globally connected SOCs.

Important Metrics in the SOC

It’s important to measure everything and to measure it effectively. You have to measure your capacity to make sure that you’re staffed appropriately to handle extremely large volumes of incidents. That also dictates how much time an analyst can spend investigating a given incident. Guidelines dictate how much time should be spent on an incident depending on the skill set of the analyst.

Managing a global SOC requires precision queue management. We measure throughput, the demand coming in and the capacity we have to apply against the demand. This is aimed at finding the optimal performance for the SOC. Security analysts need to have the right coverage and the right amount of time to conduct a proper investigation. We have a fully curated methodology for how we investigate each type of threat. We measure the performance against those threat types as well. It gets pretty granular.

From a broader perspective, we’re managing throughput and the cycle times from when the telemetry come into the system to the amount of time that a threat queues (called dwell time), and before an analyst identifies it for investigation and review. We then have what is called work in progress time (WIP time) where the analyst is actively investigating the incident. Then lastly, the notification and communication out to the client with the appropriate recommendation and actions.

For me, measuring the life cycle of an incident is probably the most important overarching measure for the SOC. If you’re hitting your targets in terms of quality and speed, that means you have the right amount of capacity. It also means the systems are operating efficiently, the analyst has the skills and tools to do effective investigation and we were able to give the client an actionable output in a timely fashion.  This enables the client to take specific and targeted mitigation steps.

Overall, cycle-time is one of the key elements, but there are many other measures that support the ability to measure cycle times. Think of it as building blocks.

Metrics for Threat Monitoring and Investigations

In the security industry, we are moving to what’s known as the “Golden Hour,” which is telemetry coming in from detection all the way through mitigation, executed in an hour. For IBM Managed Security Services, with the improvements we’ve made over the last year with use-case, rule optimization, AI and Machine Learning, our cycle time averages are now in the 60 minute window.

Quality is typically at the other end of the spectrum from cycle times. We have an automated system to keep track of quality. That system automatically evaluates the analyst’s investigation and their disposition decisions on an ongoing basis. Those results are then populated over to an audit team that reviews those dispositions. Lastly, a full life cycle program is used to address deficiencies with analyst re-education and constant upskilling supported by weekly team quality metrics reviews.

Quality and speed are the two sides of the coin. Consistent attention to achieving both factors, whether you are a single client shop or the leading global provider, is the ultimate objective

Moving Toward Improving Full Security Posture Using Other Metrics 

There are a number of advances that we’re making within our service to help improve what we call “upstream quality” or the value of what we deliver directly to the client. That has a lot to do with measurements around the use cases and rules that a client has applied in their SIEM tool.

In this instance, we’re measuring the level of coverage against industry security frameworks NIST and MITRE ATT&CK. We are measuring the quality of the rules and whether or not the client has redundancy or obsolescence in those rules. We do this because the more well-tuned that environment, the better kill chain coverage and quality of alert that is going to be generated to the analysts. We measure all of those factors in the service while rapidly moving toward using machine learning to optimize capabilities.

The upstream quality, the use cases and the frameworks are very much in line with improving the overall maturity for the client. The goal is to continually evaluate and review with each client the maturity state of their security program, their security posture and the health and capability of their security telemetry and platform

To do this, we use Advanced Rule Analytics (ARA) which is a machine learning capability that we developed to evaluate the use-cases and rules. We are also moving to use ML to evaluate the raw telemetry that we’re receiving to ensure that the telemetry is of an optimal quality.

With our managed security services operations, we bring together multiple security functions and a large global team that’s widely distributed. Our ecosystem operates on an broad and precision set of capabilities and tools, integrated into a single platform model. Although we’re recognized as managed security services we more accurately a Security as a Service provider with a proprietary AI-based threat management and monitoring platform couple with high-skill security team.

Unlike other MSSPs, we offer a robust, integrated platform ecosystem that is market-leading. Learn more about why IBM was selected as a Global and European Leader in Managed Security Services.

More from Intelligence & Analytics

RansomExx Upgrades to Rust

IBM Security X-Force Threat Researchers have discovered a new variant of the RansomExx ransomware that has been rewritten in the Rust programming language, joining a growing trend of ransomware developers switching to the language. Malware written in Rust often benefits from lower AV detection rates (compared to those written in more common languages) and this may have been the primary reason to use the language. For example, the sample analyzed in this report was not detected as malicious in the…

Moving at the Speed of Business — Challenging Our Assumptions About Cybersecurity

The traditional narrative for cybersecurity has been about limited visibility and operational constraints — not business opportunities. These conversations are grounded in various assumptions, such as limited budgets, scarce resources, skills being at a premium, the attack surface growing, and increased complexity. For years, conventional thinking has been that cybersecurity costs a lot, takes a long time, and is more of a cost center than an enabler of growth. In our upcoming paper, Prosper in the Cyber Economy, published by…

Overcoming Distrust in Information Sharing: What More is There to Do?

As cyber threats increase in frequency and intensity worldwide, it has never been more crucial for governments and private organizations to work together to identify, analyze and combat attacks. Yet while the federal government has strongly supported this model of private-public information sharing, the reality is less than impressive. Many companies feel that intel sharing is too one-sided, as businesses share as much threat intel as governments want but receive very little in return. The question is, have government entities…

Tackling Today’s Attacks and Preparing for Tomorrow’s Threats: A Leader in 2022 Gartner® Magic Quadrant™ for SIEM

Get the latest on IBM Security QRadar SIEM, recognized as a Leader in the 2022 Gartner Magic Quadrant. As I talk to security leaders across the globe, four main themes teams constantly struggle to keep up with are: The ever-evolving and increasing threat landscape Access to and retaining skilled security analysts Learning and managing increasingly complex IT environments and subsequent security tooling The ability to act on the insights from their security tools including security information and event management software…