SlideShare a Scribd company logo
Apache Metron
A Case Study of a Modern Streaming Architecture on Hadoop
Casey Stella
@casey_stella
2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Introduction
Hi, I’m Casey Stella!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer a
centralized tool for security monitoring and analysis.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer a
centralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer a
centralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.
• Metron was submitted to the Apache Incubator in December 2015
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer a
centralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.
• Metron was submitted to the Apache Incubator in December 2015
• Metron graduated to a top level project in April 2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart
◦ New enrichment capabilities can be added via user defined functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart
◦ New enrichment capabilities can be added via user defined functions
◦ Enrichments can be composed through a domain specific language called Stellar
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,
enriching and storing streaming data at scale
◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart
◦ New enrichment capabilities can be added via user defined functions
◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake
◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake
◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data
◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and
enable approximate distribution, set existence and distinct count queries
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake
◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data
◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and
enable approximate distribution, set existence and distinct count queries
◦ Models can be deployed using Yarn, autodiscovered via Zookeeper and interrogated via
Stellar functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich and
transform streaming data. Out of this need, we created Stellar:
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich and
transform streaming data. Out of this need, we created Stellar:
• Interact with the various enabling Hadoop components in a unified manner
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich and
transform streaming data. Out of this need, we created Stellar:
• Interact with the various enabling Hadoop components in a unified manner
• Compose a rich set of built-in functions with user defined functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich and
transform streaming data. Out of this need, we created Stellar:
• Interact with the various enabling Hadoop components in a unified manner
• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,
numerical computation.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich and
transform streaming data. Out of this need, we created Stellar:
• Interact with the various enabling Hadoop components in a unified manner
• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,
numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON
◦ Specifying the parser via Grok
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON
◦ Specifying the parser via Grok
◦ Creating your own parser in a JVM-based language
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON
◦ Specifying the parser via Grok
◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to a
normalized JSON Map
◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON
◦ Specifying the parser via Grok
◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements
• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Data Ingest: Parsers
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data
• Enrichment is split into two phases
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data
• Enrichment is split into two phases
◦ Preparatory Enrichment
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data
• Enrichment is split into two phases
◦ Preparatory Enrichment
◦ Threat Intelligence Enrichment with risk triage
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allows
users to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data
• Enrichment is split into two phases
◦ Preparatory Enrichment
◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage score
computed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.
• User defined enrichment functions can be enabled through adding a jar
implementing the function to HDFS.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.
• User defined enrichment functions can be enabled through adding a jar
implementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and their
results joined together
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.
• User defined enrichment functions can be enabled through adding a jar
implementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and their
results joined together
Stellar functions are provided for, among others,
• Interrogating ML models
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.
• User defined enrichment functions can be enabled through adding a jar
implementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and their
results joined together
Stellar functions are provided for, among others,
• Interrogating ML models
• Reading reference data from HBase
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.
• User defined enrichment functions can be enabled through adding a jar
implementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and their
results joined together
Stellar functions are provided for, among others,
• Interrogating ML models
• Reading reference data from HBase
• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Enrichment
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Indexing
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
◦ Both at the same time
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
◦ Both at the same time
• Operating across multiple sources has scalability implications
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
◦ Both at the same time
• Operating across multiple sources has scalability implications
• Waiting on the data you want from the other stream isn’t plausible
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
◦ Both at the same time
• Operating across multiple sources has scalability implications
• Waiting on the data you want from the other stream isn’t plausible
• Sometimes you want to change your mind about how much history you want to refer
to
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events
◦ Both at the same time
• Operating across multiple sources has scalability implications
• Waiting on the data you want from the other stream isn’t plausible
• Sometimes you want to change your mind about how much history you want to refer
to
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
• This enables things like
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
• This enables things like
◦ Temporal outlier detection
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
• This enables things like
◦ Temporal outlier detection
◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
• This enables things like
◦ Temporal outlier detection
◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)
◦ May not be able to be merged across time (e.g. distributions)
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)
◦ May not be able to be merged across time (e.g. distributions)
◦ Profile reading should be decoupled from writing
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)
◦ May not be able to be merged across time (e.g. distributions)
◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures
◦ Set operations: Bloom Filters, HyperLogLog approximations
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality
◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)
◦ May not be able to be merged across time (e.g. distributions)
◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures
◦ Set operations: Bloom Filters, HyperLogLog approximations
◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Demo
• Los Alamos National Lab released an open data set representing 58 consecutive days
of de-identified event data collected from five sources within Los Alamos National
Laboratory’s corporate, internal computer network.
• Among other telemetry sources, authentication logs and a set of well-defined red
teaming events that present bad behavior within the 58 days are provided.
• We will look at the authentication logs around a breach and show how we can use
Metron to pick out offending user’s activity leading up to the event
◦ Look for users who are attempting to authenticate to many distinct hosts more than 5
standard deviations from the median across all users.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
"profile": "attempts_by_user",
"foreach": "user",
"onlyif": "source.type == 'auth'",
"init" : {
"total" : "HLLP_INIT(5,6)"
},
"update": {
"total" : "HLLP_ADD(total, ip_dst_addr)"
},
"result" : {
"profile" : "total",
"triage" : {
"total_count" : "HLLP_CARDINALITY(total)"
}
}
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
"profile": "auth_distribution",
"foreach": "'global'",
"onlyif": "source.type == 'profiler' && profile == 'attempts_by_user'",
"init" : {
"s" : "STATS_INIT()"
},
"update": {
"s" : "STATS_ADD(s, total_count)"
},
"result": "s"
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
window := PROFILE_WINDOW('...')
profile := PROFILE_GET('attempts_by_user', user, window)
distinct_auth_attempts := HLLP_CARDINALITY(GET_LAST(profile))
distribution_profile := PROFILE_GET('auth_distribution', 'global', window)
stats := STATS_MERGE(distribution_profile)
distinct_auth_attempts_median := STATS_PERCENTILE(stats, 0.5)
distinct_auth_attempts_stddev := STATS_SD(stats)
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
Questions
Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather
session Thursday.
• Find me at http://caseystella.com
• Twitter handle: @casey_stella
• Email address: cstella@hortonworks.com
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017

More Related Content

What's hot (20)

PPTX
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
 
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
PDF
Visualizing Big Data in Realtime
DataWorks Summit
 
PPTX
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
PDF
Realizing the Promise of Portable Data Processing with Apache Beam
DataWorks Summit
 
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
PPTX
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
PPTX
Built-In Security for the Cloud
DataWorks Summit
 
PDF
HAWQ Meets Hive - Querying Unmanaged Data
DataWorks Summit
 
PPTX
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
PPTX
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
PPTX
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
 
PDF
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
PPTX
Insights into Real-world Data Management Challenges
DataWorks Summit
 
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Visualizing Big Data in Realtime
DataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
Realizing the Promise of Portable Data Processing with Apache Beam
DataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
DataWorks Summit/Hadoop Summit
 
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Built-In Security for the Cloud
DataWorks Summit
 
HAWQ Meets Hive - Querying Unmanaged Data
DataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
 
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
 
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Insights into Real-world Data Management Challenges
DataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 

Viewers also liked (13)

PPTX
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
PPTX
Video Analysis in Hadoop
DataWorks Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
PPTX
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
PPTX
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
PPTX
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
Video Analysis in Hadoop
DataWorks Summit
 
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop (20)

PPTX
Apache metron meetup presentation at capital one
gvetticaden
 
PPTX
Apache Metron Meetup May 4, 2016 - Big data cybersecurity
Hortonworks
 
PDF
Apache Metron in the Real World
Dave Russell
 
PDF
Apache Metron in the Real World
DataWorks Summit
 
PPSX
Apache metron - An Introduction
Baban Gaigole
 
PPTX
Tracing your security telemetry with Apache Metron
DataWorks Summit/Hadoop Summit
 
PPTX
Building a modern end-to-end open source Big Data reference application
DataWorks Summit
 
PDF
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
PDF
Solving Cybersecurity at Scale
DataWorks Summit
 
PDF
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
PDF
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Lucidworks
 
PDF
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Mathias Herberts
 
PPTX
A streaming architecture for Cyber Security - Apache Metron
Simon Elliston Ball
 
PPTX
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Ryan Bosshart
 
PDF
Hadoop Ecosystem and Low Latency Streaming Architecture
InSemble
 
PDF
Building end to end streaming application on Spark
datamantra
 
PPTX
An adaptive and eventually self healing framework for geo-distributed real-ti...
Angad Singh
 
PPTX
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PDF
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Apache metron meetup presentation at capital one
gvetticaden
 
Apache Metron Meetup May 4, 2016 - Big data cybersecurity
Hortonworks
 
Apache Metron in the Real World
Dave Russell
 
Apache Metron in the Real World
DataWorks Summit
 
Apache metron - An Introduction
Baban Gaigole
 
Tracing your security telemetry with Apache Metron
DataWorks Summit/Hadoop Summit
 
Building a modern end-to-end open source Big Data reference application
DataWorks Summit
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Solving Cybersecurity at Scale
DataWorks Summit
 
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
Cybersecurity with Apache Metron and Apache Solr - Ward Bekker, Hortonworks &...
Lucidworks
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Mathias Herberts
 
A streaming architecture for Cyber Security - Apache Metron
Simon Elliston Ball
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Ryan Bosshart
 
Hadoop Ecosystem and Low Latency Streaming Architecture
InSemble
 
Building end to end streaming application on Spark
datamantra
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
Angad Singh
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
July Patch Tuesday
Ivanti
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
July Patch Tuesday
Ivanti
 

Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop

  • 1. Apache Metron A Case Study of a Modern Streaming Architecture on Hadoop Casey Stella @casey_stella 2017 Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 2. Introduction Hi, I’m Casey Stella! Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 3. Apache Metron: A Cybersecurity Analytics Platform • Metron provides a scalable, advanced security analytics framework to offer a centralized tool for security monitoring and analysis. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 4. Apache Metron: A Cybersecurity Analytics Platform • Metron provides a scalable, advanced security analytics framework to offer a centralized tool for security monitoring and analysis. • Metron was initiated at Cisco in 2014 as OpenSOC. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 5. Apache Metron: A Cybersecurity Analytics Platform • Metron provides a scalable, advanced security analytics framework to offer a centralized tool for security monitoring and analysis. • Metron was initiated at Cisco in 2014 as OpenSOC. • Metron was submitted to the Apache Incubator in December 2015 Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 6. Apache Metron: A Cybersecurity Analytics Platform • Metron provides a scalable, advanced security analytics framework to offer a centralized tool for security monitoring and analysis. • Metron was initiated at Cisco in 2014 as OpenSOC. • Metron was submitted to the Apache Incubator in December 2015 • Metron graduated to a top level project in April 2017 Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 7. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 8. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 9. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 10. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 11. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 12. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store • Ingested network telemetry can be enriched pluggably Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 13. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store • Ingested network telemetry can be enriched pluggably ◦ New enrichments can be done live on running topologies without restart Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 14. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store • Ingested network telemetry can be enriched pluggably ◦ New enrichments can be done live on running topologies without restart ◦ New enrichment capabilities can be added via user defined functions Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 15. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store • Ingested network telemetry can be enriched pluggably ◦ New enrichments can be done live on running topologies without restart ◦ New enrichment capabilities can be added via user defined functions ◦ Enrichments can be composed through a domain specific language called Stellar Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 16. Characteristics of Metron • Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting, enriching and storing streaming data at scale ◦ Kafka provides a unified data bus ◦ Storm providing a distributed streaming framework ◦ HBase provides a low latency key/value lookup store for enrichments and profiles ◦ Zookeeper provides a distributed configuration store • Ingested network telemetry can be enriched pluggably ◦ New enrichments can be done live on running topologies without restart ◦ New enrichment capabilities can be added via user defined functions ◦ Enrichments can be composed through a domain specific language called Stellar • Data stored in HBase can be the source of enrichments Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 17. Characteristics of Metron • Enriched telemetry can be indexed into a Security data lake ◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch • Advanced analytics can be done on streaming data Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 18. Characteristics of Metron • Enriched telemetry can be indexed into a Security data lake ◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch • Advanced analytics can be done on streaming data ◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and enable approximate distribution, set existence and distinct count queries Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 19. Characteristics of Metron • Enriched telemetry can be indexed into a Security data lake ◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch • Advanced analytics can be done on streaming data ◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and enable approximate distribution, set existence and distinct count queries ◦ Models can be deployed using Yarn, autodiscovered via Zookeeper and interrogated via Stellar functions Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 20. Stellar Metron needed the ability to allow users to pluggably and consistently enrich and transform streaming data. Out of this need, we created Stellar: Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 21. Stellar Metron needed the ability to allow users to pluggably and consistently enrich and transform streaming data. Out of this need, we created Stellar: • Interact with the various enabling Hadoop components in a unified manner Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 22. Stellar Metron needed the ability to allow users to pluggably and consistently enrich and transform streaming data. Out of this need, we created Stellar: • Interact with the various enabling Hadoop components in a unified manner • Compose a rich set of built-in functions with user defined functions Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 23. Stellar Metron needed the ability to allow users to pluggably and consistently enrich and transform streaming data. Out of this need, we created Stellar: • Interact with the various enabling Hadoop components in a unified manner • Compose a rich set of built-in functions with user defined functions • Provide simple primitives around the functions: boolean operations, conditionals, numerical computation. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 24. Stellar Metron needed the ability to allow users to pluggably and consistently enrich and transform streaming data. Out of this need, we created Stellar: • Interact with the various enabling Hadoop components in a unified manner • Compose a rich set of built-in functions with user defined functions • Provide simple primitives around the functions: boolean operations, conditionals, numerical computation. Think of Stellar as Excel functions that we can run on streaming data. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 25. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 26. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 27. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 28. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 29. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats ◦ Generic formats such as CSV and JSON Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 30. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats ◦ Generic formats such as CSV and JSON ◦ Specifying the parser via Grok Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 31. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats ◦ Generic formats such as CSV and JSON ◦ Specifying the parser via Grok ◦ Creating your own parser in a JVM-based language Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 32. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats ◦ Generic formats such as CSV and JSON ◦ Specifying the parser via Grok ◦ Creating your own parser in a JVM-based language • Transformations and normalizations can be done post-parse via Stellar statements Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 33. Data Ingest: Parsers • Telemetry data comes in a variety of formats and velocities. • Each telemetry source is ingested into kafka • A Storm parser topology is used to convert the raw telemetry format to a normalized JSON Map ◦ Common network-related raw telemetry formats ◦ Generic formats such as CSV and JSON ◦ Specifying the parser via Grok ◦ Creating your own parser in a JVM-based language • Transformations and normalizations can be done post-parse via Stellar statements • The normalized data across all telemetries is written to an enrichment kafka topic Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 34. Data Ingest: Parsers Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 35. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 36. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 37. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 38. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions ◦ Enriching with Geolocation data Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 39. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions ◦ Enriching with Geolocation data • Enrichment is split into two phases Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 40. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions ◦ Enriching with Geolocation data • Enrichment is split into two phases ◦ Preparatory Enrichment Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 41. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions ◦ Enriching with Geolocation data • Enrichment is split into two phases ◦ Preparatory Enrichment ◦ Threat Intelligence Enrichment with risk triage Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 42. Enrichment • The enrichment topology takes the various normalized telemetry sources and allows users to enrich the messages with broader context ◦ Enriching with reference data ingested into HBase ◦ Enriching via arbitrary Stellar expressions ◦ Enriching with Geolocation data • Enrichment is split into two phases ◦ Preparatory Enrichment ◦ Threat Intelligence Enrichment with risk triage • If messages are marked with an is_alert field, they can have a triage score computed via Stellar which defines their priority as threats Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 43. Enrichment: Stellar Stellar is the primary method for enrichment in Metron. • User defined enrichment functions can be enabled through adding a jar implementing the function to HDFS. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 44. Enrichment: Stellar Stellar is the primary method for enrichment in Metron. • User defined enrichment functions can be enabled through adding a jar implementing the function to HDFS. • Stellar enrichments can be executed asynchronously across storm workers and their results joined together Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 45. Enrichment: Stellar Stellar is the primary method for enrichment in Metron. • User defined enrichment functions can be enabled through adding a jar implementing the function to HDFS. • Stellar enrichments can be executed asynchronously across storm workers and their results joined together Stellar functions are provided for, among others, • Interrogating ML models Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 46. Enrichment: Stellar Stellar is the primary method for enrichment in Metron. • User defined enrichment functions can be enabled through adding a jar implementing the function to HDFS. • Stellar enrichments can be executed asynchronously across storm workers and their results joined together Stellar functions are provided for, among others, • Interrogating ML models • Reading reference data from HBase Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 47. Enrichment: Stellar Stellar is the primary method for enrichment in Metron. • User defined enrichment functions can be enabled through adding a jar implementing the function to HDFS. • Stellar enrichments can be executed asynchronously across storm workers and their results joined together Stellar functions are provided for, among others, • Interrogating ML models • Reading reference data from HBase • Reading historical profiles Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 50. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 51. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 52. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 53. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 54. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events ◦ Both at the same time Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 55. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events ◦ Both at the same time • Operating across multiple sources has scalability implications Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 56. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events ◦ Both at the same time • Operating across multiple sources has scalability implications • Waiting on the data you want from the other stream isn’t plausible Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 57. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events ◦ Both at the same time • Operating across multiple sources has scalability implications • Waiting on the data you want from the other stream isn’t plausible • Sometimes you want to change your mind about how much history you want to refer to Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 58. Profiler: Motivation • Enrichments and parsers operate within the context of a single message. • This is insufficient for a number of scenarios ◦ Correlating between different sources ◦ Making judgments based on past events ◦ Both at the same time • Operating across multiple sources has scalability implications • Waiting on the data you want from the other stream isn’t plausible • Sometimes you want to change your mind about how much history you want to refer to • In cyber security, advanced actors wait for months, so you need data months back! Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 59. Profiler: Solution • Compromise: Operate on windows in time rather than individual records Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 60. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 61. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 62. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 63. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 64. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider • These aggregations can be read anywhere Stellar is used Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 65. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider • These aggregations can be read anywhere Stellar is used • This enables things like Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 66. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider • These aggregations can be read anywhere Stellar is used • This enables things like ◦ Temporal outlier detection Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 67. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider • These aggregations can be read anywhere Stellar is used • This enables things like ◦ Temporal outlier detection ◦ Historical context from other sources when building threat triage rules Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 68. Profiler: Solution • Compromise: Operate on windows in time rather than individual records • Windows should be able to be specified very flexibly to avoid seasonal aberrations. • The Profiler is a storm topology that takes the enriched data ◦ Capture aggregations of data in a window to HBase ◦ Uses Stellar to define how to aggregate data ◦ Uses Stellar to define a filter on which messages to consider • These aggregations can be read anywhere Stellar is used • This enables things like ◦ Temporal outlier detection ◦ Historical context from other sources when building threat triage rules Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 69. Profiler: Aggregations • Example aggregations: Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 70. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 71. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 72. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 73. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 74. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges ◦ May be big objects if naively done (e.g. set operations) Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 75. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges ◦ May be big objects if naively done (e.g. set operations) ◦ May not be able to be merged across time (e.g. distributions) Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 76. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges ◦ May be big objects if naively done (e.g. set operations) ◦ May not be able to be merged across time (e.g. distributions) ◦ Profile reading should be decoupled from writing Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 77. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges ◦ May be big objects if naively done (e.g. set operations) ◦ May not be able to be merged across time (e.g. distributions) ◦ Profile reading should be decoupled from writing • Approach: Use approximate data structures ◦ Set operations: Bloom Filters, HyperLogLog approximations Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 78. Profiler: Aggregations • Example aggregations: ◦ Distributional statistics: median, mean, percentile ◦ Set operations: Contains, Cardinality ◦ Simple counts • Aggregations challenges ◦ May be big objects if naively done (e.g. set operations) ◦ May not be able to be merged across time (e.g. distributions) ◦ Profile reading should be decoupled from writing • Approach: Use approximate data structures ◦ Set operations: Bloom Filters, HyperLogLog approximations ◦ Distributional Statistics: t-digests Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 79. Demo • Los Alamos National Lab released an open data set representing 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network. • Among other telemetry sources, authentication logs and a set of well-defined red teaming events that present bad behavior within the 58 days are provided. • We will look at the authentication logs around a breach and show how we can use Metron to pick out offending user’s activity leading up to the event ◦ Look for users who are attempting to authenticate to many distinct hosts more than 5 standard deviations from the median across all users. Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 80. "profile": "attempts_by_user", "foreach": "user", "onlyif": "source.type == 'auth'", "init" : { "total" : "HLLP_INIT(5,6)" }, "update": { "total" : "HLLP_ADD(total, ip_dst_addr)" }, "result" : { "profile" : "total", "triage" : { "total_count" : "HLLP_CARDINALITY(total)" } } Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 81. "profile": "auth_distribution", "foreach": "'global'", "onlyif": "source.type == 'profiler' && profile == 'attempts_by_user'", "init" : { "s" : "STATS_INIT()" }, "update": { "s" : "STATS_ADD(s, total_count)" }, "result": "s" Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 82. window := PROFILE_WINDOW('...') profile := PROFILE_GET('attempts_by_user', user, window) distinct_auth_attempts := HLLP_CARDINALITY(GET_LAST(profile)) distribution_profile := PROFILE_GET('auth_distribution', 'global', window) stats := STATS_MERGE(distribution_profile) distinct_auth_attempts_median := STATS_PERCENTILE(stats, 0.5) distinct_auth_attempts_stddev := STATS_SD(stats) Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
  • 83. Questions Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feather session Thursday. • Find me at http://caseystella.com • Twitter handle: @casey_stella • Email address: [email protected] Casey Stella@casey_stella (Hortonworks) Apache Metron 2017