Information security, data science and cloud computing skills are the most sought-after talents in the marketplace today. Security operations center (SOC) resources — typically analysts and threat hunters — are increasingly needed to combat the growing threat of adversaries launching aggressive campaigns with the latest techniques and technologies.
The World of the Security Data Scientist
While there are several products to identify, detect and contain known threats and any indicator of compromise (IOC), there is very little protection against unknown threats, zero-day exploits and newly identified vulnerabilities. With the explosion of enriched security log data from thousands of servers, devices, databases and applications, managing this highly complex puddle of structured and unstructured data is a humongous task.
Enter the security data scientist.
What Is a Security Data Scientist?
Security data scientists are practitioners with a solid domain knowledge on network security, identity and access management (IAM) and vulnerability management. However, their core expertise lies in the deep conceptual understanding of advanced mathematics and statistical concepts. These include linear algebra, differential equations, probability distributions, quantitative methods and inferential statistics.
Security data scientists have the skills to understand complex algorithms and build advanced models, applying these concepts to real security data sets in single or clustered environments. They are experts in computer programming languages like Python, R, Scala or MATLAB.
They are also deft at using big data technologies, such as Hadoop Distributed File System (HDFS), Elasticsearch, MapReduce and Apache Spark, to architect enterprise-level security data lake solutions. They also have the business knowledge to present complex data visualizations describing data relationships, such as key performance indicators (KPIs), metrics and scorecards, to senior business executives.
Analytics Services
Security organizations need data scientists to organize, aggregate, enrich and transform huge volume of security data sets into meaningful schema and models. They need to understand underlying data relationships using descriptive analytics, such as correlation heat maps, cause and effect diagrams, time series and frequency charts. Once the data is transformed, cleaned and persisted in a structured format, the data scientist can train the machine to learn from labeled historical data sets and predict outcomes using supervised machine learning. They can also detect patterns and classes in unlabeled data using unsupervised techniques, such as clustering, dimensionality reduction and anomaly detection.
False positive classification, pattern analytics, model scoring, topic modeling and rule analytics are other use cases where machine learning and predictive analytics can provide huge benefits to companies. Such projects can help simplify workflow, automate repetitive manual functions and discover new insights and data patterns.
A few organizations today are also employing junior data scientists and data analysts for building security dashboards and simulation models for analyzing, monitoring and reporting using business intelligence tools. As security organizations integrate with mainstream business, security data science will evolve — providing analytics services to other groups, such as fraud analytics, risk analytics, behavior analytics and disaster recovery.
Security analysts today are heads-down on real-time streaming events, IOCs and intelligence feeds. They have little bandwidth to research unknown threats or identify historical data anomalies.
A security data scientist has the skills and training to perform these advanced analytics tasks on data at rest and in motion — supporting analysts and providing deep insights to the chief information security officer (CISO) and the business. If you have taken the time to bake the cake, make sure to add the icing.
Security Data Scientist, Senior Managing Consultant, IBM