Information security, data science and cloud computing skills are the most sought-after talents in the marketplace today. Security operations center (SOC) resources — typically analysts and threat hunters — are increasingly needed to combat the growing threat of adversaries launching aggressive campaigns with the latest techniques and technologies.

The World of the Security Data Scientist

While there are several products to identify, detect and contain known threats and any indicator of compromise (IOC), there is very little protection against unknown threats, zero-day exploits and newly identified vulnerabilities. With the explosion of enriched security log data from thousands of servers, devices, databases and applications, managing this highly complex puddle of structured and unstructured data is a humongous task.

Enter the security data scientist.

What Is a Security Data Scientist?

Security data scientists are practitioners with a solid domain knowledge on network security, identity and access management (IAM) and vulnerability management. However, their core expertise lies in the deep conceptual understanding of advanced mathematics and statistical concepts. These include linear algebra, differential equations, probability distributions, quantitative methods and inferential statistics.

Security data scientists have the skills to understand complex algorithms and build advanced models, applying these concepts to real security data sets in single or clustered environments. They are experts in computer programming languages like Python, R, Scala or MATLAB.

They are also deft at using big data technologies, such as Hadoop Distributed File System (HDFS), Elasticsearch, MapReduce and Apache Spark, to architect enterprise-level security data lake solutions. They also have the business knowledge to present complex data visualizations describing data relationships, such as key performance indicators (KPIs), metrics and scorecards, to senior business executives.

Analytics Services

Security organizations need data scientists to organize, aggregate, enrich and transform huge volume of security data sets into meaningful schema and models. They need to understand underlying data relationships using descriptive analytics, such as correlation heat maps, cause and effect diagrams, time series and frequency charts. Once the data is transformed, cleaned and persisted in a structured format, the data scientist can train the machine to learn from labeled historical data sets and predict outcomes using supervised machine learning. They can also detect patterns and classes in unlabeled data using unsupervised techniques, such as clustering, dimensionality reduction and anomaly detection.

False positive classification, pattern analytics, model scoring, topic modeling and rule analytics are other use cases where machine learning and predictive analytics can provide huge benefits to companies. Such projects can help simplify workflow, automate repetitive manual functions and discover new insights and data patterns.

A few organizations today are also employing junior data scientists and data analysts for building security dashboards and simulation models for analyzing, monitoring and reporting using business intelligence tools. As security organizations integrate with mainstream business, security data science will evolve — providing analytics services to other groups, such as fraud analytics, risk analytics, behavior analytics and disaster recovery.

Security analysts today are heads-down on real-time streaming events, IOCs and intelligence feeds. They have little bandwidth to research unknown threats or identify historical data anomalies.

A security data scientist has the skills and training to perform these advanced analytics tasks on data at rest and in motion — supporting analysts and providing deep insights to the chief information security officer (CISO) and the business. If you have taken the time to bake the cake, make sure to add the icing.

More from Intelligence & Analytics

Hive0051’s large scale malicious operations enabled by synchronized multi-channel DNS fluxing

12 min read - For the last year and a half, IBM X-Force has actively monitored the evolution of Hive0051’s malware capabilities. This Russian threat actor has accelerated its development efforts to support expanding operations since the onset of the Ukraine conflict. Recent analysis identified three key changes to capabilities: an improved multi-channel approach to DNS fluxing, obfuscated multi-stage scripts, and the use of fileless PowerShell variants of the Gamma malware. As of October 2023, IBM X-Force has also observed a significant increase in…

Email campaigns leverage updated DBatLoader to deliver RATs, stealers

11 min read - IBM X-Force has identified new capabilities in DBatLoader malware samples delivered in recent email campaigns, signaling a heightened risk of infection from commodity malware families associated with DBatLoader activity. X-Force has observed nearly two dozen email campaigns since late June leveraging the updated DBatLoader loader to deliver payloads such as Remcos, Warzone, Formbook, and AgentTesla. DBatLoader malware has been used since 2020 by cybercriminals to install commodity malware remote access Trojans (RATs) and infostealers, primarily via malicious spam (malspam). DBatLoader…

New Hive0117 phishing campaign imitates conscription summons to deliver DarkWatchman malware

8 min read - IBM X-Force uncovered a new phishing campaign likely conducted by Hive0117 delivering the fileless malware DarkWatchman, directed at individuals associated with major energy, finance, transport, and software security industries based in Russia, Kazakhstan, Latvia, and Estonia. DarkWatchman malware is capable of keylogging, collecting system information, and deploying secondary payloads. Imitating official correspondence from the Russian government in phishing emails aligns with previous Hive0117 campaigns delivering DarkWatchman malware, and shows a possible significant effort to induce a sense of urgency as…

X-Force releases detection & response framework for managed file transfer software

5 min read - How AI can help defenders scale detection guidance for enterprise software tools If we look back at mass exploitation events that shook the security industry like Log4j, Atlassian, and Microsoft Exchange when these solutions were actively being exploited by attackers, the exploits may have been associated with a different CVE, but the detection and response guidance being released by the various security vendors had many similarities (e.g., Log4shell vs. Log4j2 vs. MOVEit vs. Spring4Shell vs. Microsoft Exchange vs. ProxyShell vs.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today