SlideShare a Scribd company logo
Hadoop Summit 2015
CloudFire Analytics: Transforming Security
Analyzing Symantec’s security data lake
Stephen Brodsky and Darrell Kienzle
Hadoop Summit 2015
Outline
CPE CloudFire: Analytics and Products1
Analytics Services and Data2
Analytics Administration and Monitoring3
Self-Service Analytics and Dynamic Clusters4
Symantec CloudFire Analytics 2
Hadoop Summit 2015
CPE CloudFire: Analytics and Products
• CPE – Cloud Platform Engineering
– Symantec’s Private Cloud for security products and analytics
– Spans 50+ data centers around the world
• CloudFire – CPE’s scalable cloud platform
– New data centers, new hardware, scalable build-out
– All open source
– OpenStack for virtualization
– Analytics for big data analysis
• Cloud for bringing together and integrating Symantec’s:
– Products
– Big Data, Analytic applications, and Services
– Compute, Network
Symantec CloudFire Analytics 3
Hadoop Summit 2015
Analytics for supporting Security Products
Goals of the Security Teams
1. Do we have the data?
2. Can we analyze the
data?
3. Can we provide timely
Insights?
CloudFire Analytics for supporting Security
1. Data available
• All frequently used Security Data available
• Data at scale: Data available for parallel analysis
(PB scale)
• Leveraging CPE Data Center Availability,
Compute, Net
2. Analysis engines
• Hadoop ecosystem engines (MR, Hive, Kafka,
Spark, HBase, Storm, Phoenix, ++)
• Analytics Pipeline
3. Analysis in near real-time
• Analytics timescale days/hours -> seconds
• Batch -> Streaming
4Symantec CloudFire Analytics
CPE Analytics Architecture Overview
Inbound Messaging
(Data import, Kafka)
Products
Distributed Storage (HDFS), Metal or Virtual (OpenStack) Servers
Analytic Applications, Workload Management (YARN)
Stream Processing
(Storm, Spark)
Real-time Results
(HBase, ElasticSearch)
Query
(Hive, Spark SQL)
Device
Agents
Telemetry, Data
Threats: Top Web-based
Attacks
192 web-based attacks recorded last month
36%
30%
14%
10%
7%3% Malvertisement
Exploit Kit
Suspicious
Download
Analytics Clusters
• All major open
source engines
• PB-scale Data Store
• Key Telemetry
• Multi-Data Center
• Administration
• Monitoring
• Prod, Dev/Test
• Application
Deployments
CloudFire Analytics
Data Transfer
5
Hadoop Summit 2015
A key security Product
Symantec CloudFire Analytics 6
• Live with hundreds of external customers
• Improved time to analyzed results from hours to seconds (5000x) at production
scale
• Leverage Streaming Analytics
• Move analytics to more powerful, high performance open source technologies
– Kafka for queuing
– Storm for analytics
• Improve analytics
– Server reputation
– URL reputation
• Moving forward
– Leverage Analytics Pipeline
– NoSQL DBs
– Graph DBs
HDFSMonitors
Hadoop Summit 2015
Analytics Services and Data
Symantec CloudFire Analytics 7
Hadoop Summit 2015
Product Journey with CloudFire Analytics
Symantec CloudFire Analytics 8
SDAP current
• Scale?
• Availability?
• Usage?
• Hours/Days
timeframe
Batch Analytics
•Data Scale: Multi-PB
HDFS
•Analytics scale: Hadoop
MR, Hive
•Availability – high
•Usage prioritization
•Minutes/Hours
timeframe
Streaming Analytics
•Analytics Scale: Kafka,
Storm, Spark,
ElasticSearch, HBase, ++
•Availability – high
•Usage prioritization
•Seconds timeframe
Application style
• Bring in data
• Analyze
• Architecture: Serial->Parallel
• Workflow oriented
Application style
• Bring in data in zip
files
• Unzip, load, SQL
• Architecture: Serial
• Application oriented
Application style
• Stream in data directly
• Analytics pipeline
• Architecture: Streaming
Parallel
• Streaming oriented
CloudFire Analytics Pipeline
Data Sources
Streaming
Analysis
Batch
Analysis
Actively
Available
Results
Online Access
Research
Telemetry Clients
World
Cloud
Data
Assemble Analytic Applications
Analytic Pipeline Message Queue
9
Storm, Kafka, Spark Hive, Spark
HBase, HDFS, Kafka, Cassandra
HBase, Index, Cache
Hadoop Summit 2015
CloudFire Analytics Services
Symantec CloudFire Analytics 10
Analytic Services and Data Transfer
• Hadoop, YARN - Compute
• HDFS/Knox/HttpFS – File Storage
• Hive, Pig – SQL Batch
• Oozie – Workflow and Scheduling
• Service Endpoints – Admin / BDSE
• Storm – Streaming
• Kafka - Messaging
• HBase – BigTable column store
• Spark (SQL,ML) – Stream analytics
• Research and admin CLI VMs
• MTS Parallel file Import / Export
• Falcon data management
• ElasticSearch indexing
• Cassandra, Graph, Drill, Solr
Web UIs
Admin Tools
• Ambari - Admin
• OpsView (Zabbix) – Uptime monitor
• YARN and HDFS UIs – Perf & Logs
• Ganglia & Nagios – Performance
• Storm & Kafka lag – Perf/Problems
• LMM – Problem determination & metrics
• Resource Manager, Name Node UIs
• Network data transfer monitoring
• Puma performance usage analysis
• Continuous Validation
• Ranger user management
Data Science Tools
• HUE – SQL, Jobs, Files, Workflow
• Ambari Views
Hadoop Summit 2015
Analytics Administration and Monitoring
Symantec CloudFire Analytics 11
Hadoop Summit 2015
Ambari Analytics Administration
12CPE CloudFire Analytics
Extensible administration platform for automated
deployment, monitoring and alerting, configuration
management, rolling upgrades, and rolling restarts.
Hadoop Summit 2015
Ambari + ElasticSearch – Ambari Stack extensibility
Symantec CloudFire Analytics 13
Ambari custom
service for
ElasticSearch, a new
service to enable
Ambari to manage.
Ambari deploy,
start/stop, config,
monitor
Uses the new
Ambari Views,
integrated via
iFrame
Elastic HQ admin UI
as our use case.
Keeps the
dashboard clean,
yet extensible.
On github to share…
Hadoop Summit 2015
Ambari Views with iFrame – Ambari extensibility
Symantec CloudFire Analytics 14
Hadoop Summit 2015
Cluster Usage Analysis using Analytics
Symantec CloudFire Analytics
15
Hadoop Summit 2015
Ambari Metrics
Rapid Metrics Dashboards Construction - Reliability
Symantec CloudFire Analytics 16
Prototype for
building out
flexible
dashboards of
most
interesting
application
and platform
metrics.
Alerting
available via
the LMM
service.
Hadoop Summit 2015
Ambari Metrics
Rapid Metrics Dashboards Construction - Availability
Symantec CloudFire Analytics 17
Hadoop Summit 2015
OpsView / Nagios monitor and alert service
Symantec CloudFire Analytics 18
Hadoop Summit 2015
Data Transfer Network Monitor
Symantec CloudFire Analytics 19
Hadoop Summit 2015
Data Transfer Rate Monitor
Symantec CloudFire Analytics 20
Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 21
Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 22
Purpose
● Create and connect Analytics Clusters:
● Authoritative Data Lake, Regional Hubs
● Development, Test, Integration, Staging, and Production
● Ability for CloudFire users to spin up new clusters to develop their applications.
Feature
● One-click spin up of Analytics clusters using OpenStack VMs
● VMs that developers spin up are part of their own OpenStack quota
● Developers will have admin access to the cluster they spin up, so they can
install more services if needed.
Automation
● We control the version of the software by using the same automation code
that we use for deploying our production clusters.
Hadoop Summit 2015
Creating and managing clusters
Symantec CloudFire Analytics 23
Cluster
Directory
Hadoop Summit 2015
Self-Service Analytics Architecture
Symantec CloudFire Analytics 24
Managed Dynamic Clusters
Cross-Cluster Capabilities
Cluster
Management
Cluster 1
Cluster
Directory
Continuous
Validation &
Monitoring
Cluster 2 Cluster ... Cluster N
Data
Catalog
Analytics
Tools
Each Cluster:
• Analytic Services
• Deployment
• Networking
• Monitoring integration
• OpenStack VMs +
Containers
• Deployed Applications
Assemble Analytic Applications
Analytic Pipeline Message Queue
Hadoop Summit 2015
Thank You!
Stephen_Brodsky@Symantec.com
Darrell_Kienzle@Symantec.com

More Related Content

What's hot (20)

PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
PPTX
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
PPTX
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
PDF
Visualizing Big Data in Realtime
DataWorks Summit
 
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
PDF
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
PPTX
Insights into Real World Data Management Challenges
DataWorks Summit
 
PPTX
What's new in apache hive
DataWorks Summit
 
PPTX
Druid and Hive Together : Use Cases and Best Practices
DataWorks Summit
 
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
PPTX
Insights into Real-world Data Management Challenges
DataWorks Summit
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PDF
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
PDF
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
PPTX
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
PPTX
Provisioning Big Data Platform using Cloudbreak & Ambari
DataWorks Summit/Hadoop Summit
 
PPTX
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
PDF
HAWQ: a massively parallel processing SQL engine in hadoop
BigData Research
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Visualizing Big Data in Realtime
DataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Insights into Real World Data Management Challenges
DataWorks Summit
 
What's new in apache hive
DataWorks Summit
 
Druid and Hive Together : Use Cases and Best Practices
DataWorks Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Insights into Real-world Data Management Challenges
DataWorks Summit
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Hortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
Apache Ranger Hive Metastore Security
DataWorks Summit/Hadoop Summit
 
Provisioning Big Data Platform using Cloudbreak & Ambari
DataWorks Summit/Hadoop Summit
 
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
HAWQ: a massively parallel processing SQL engine in hadoop
BigData Research
 

Viewers also liked (20)

PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
PPTX
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
PPTX
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
PPTX
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
PPTX
Running Spark and MapReduce together in Production
DataWorks Summit
 
PPTX
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
PPTX
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
PPTX
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
PPTX
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
PDF
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
PDF
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
PPT
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
 
PDF
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
PDF
50 Shades of SQL
DataWorks Summit
 
PPTX
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
PPTX
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
PPTX
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
PPTX
Spark Application Development Made Easy
DataWorks Summit
 
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
PPTX
NoSQL Needs SomeSQL
DataWorks Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Running Spark and MapReduce together in Production
DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
DataWorks Summit
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
DataWorks Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
Inspiring Travel at Airbnb [WIP]
DataWorks Summit
 
Coexistence and Migration of Vendor HPC based infrastructure to Hadoop Ecosys...
DataWorks Summit
 
Hadoop for Genomics__HadoopSummit2010
Yahoo Developer Network
 
The Most Valuable Customer on Earth-1298: Comic Book Analysis with Oracel's B...
DataWorks Summit
 
50 Shades of SQL
DataWorks Summit
 
Karta an ETL Framework to process high volume datasets
DataWorks Summit
 
Hadoop in Validated Environment - Data Governance Initiative
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Spark Application Development Made Easy
DataWorks Summit
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
NoSQL Needs SomeSQL
DataWorks Summit
 
Ad

Similar to Analyzing the World's Largest Security Data Lake! (20)

PPTX
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
PDF
Girish Juneja - Intel Big Data & Cloud Summit 2013
IntelAPAC
 
PPTX
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
DataWorks Summit
 
PPTX
Hadoop Turns a Corner and Sees the Future
DataWorks Summit
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
PPTX
EMC config Hadoop
solarisyougood
 
PPTX
EMC Big Data Solutions Overview
walshe1
 
PPTX
Gov Day Sacramento 2015 - Keynote/Overview
Splunk
 
PDF
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
PDF
Bridging the Big Data Gap in the Software-Driven World
CA Technologies
 
PPTX
Hadoop Summit Keynote 2014
Merv Adrian
 
PDF
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
PPTX
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
PPTX
Big Data For Threat Detection & Response
Harry McLaren
 
PDF
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Stratio
 
PPTX
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
PDF
Adoption is the only option hadoop is changing our world and changing yours f...
DataWorks Summit
 
PDF
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
 
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
IntelAPAC
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
DataWorks Summit
 
Hadoop Turns a Corner and Sees the Future
DataWorks Summit
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Solving Big Data Problems using Hortonworks
DataWorks Summit/Hadoop Summit
 
EMC config Hadoop
solarisyougood
 
EMC Big Data Solutions Overview
walshe1
 
Gov Day Sacramento 2015 - Keynote/Overview
Splunk
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
Bridging the Big Data Gap in the Software-Driven World
CA Technologies
 
Hadoop Summit Keynote 2014
Merv Adrian
 
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
Big Data For Threat Detection & Response
Harry McLaren
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Stratio
 
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Adoption is the only option hadoop is changing our world and changing yours f...
DataWorks Summit
 
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
jaxconf
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
July Patch Tuesday
Ivanti
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
July Patch Tuesday
Ivanti
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Analyzing the World's Largest Security Data Lake!

  • 1. Hadoop Summit 2015 CloudFire Analytics: Transforming Security Analyzing Symantec’s security data lake Stephen Brodsky and Darrell Kienzle
  • 2. Hadoop Summit 2015 Outline CPE CloudFire: Analytics and Products1 Analytics Services and Data2 Analytics Administration and Monitoring3 Self-Service Analytics and Dynamic Clusters4 Symantec CloudFire Analytics 2
  • 3. Hadoop Summit 2015 CPE CloudFire: Analytics and Products • CPE – Cloud Platform Engineering – Symantec’s Private Cloud for security products and analytics – Spans 50+ data centers around the world • CloudFire – CPE’s scalable cloud platform – New data centers, new hardware, scalable build-out – All open source – OpenStack for virtualization – Analytics for big data analysis • Cloud for bringing together and integrating Symantec’s: – Products – Big Data, Analytic applications, and Services – Compute, Network Symantec CloudFire Analytics 3
  • 4. Hadoop Summit 2015 Analytics for supporting Security Products Goals of the Security Teams 1. Do we have the data? 2. Can we analyze the data? 3. Can we provide timely Insights? CloudFire Analytics for supporting Security 1. Data available • All frequently used Security Data available • Data at scale: Data available for parallel analysis (PB scale) • Leveraging CPE Data Center Availability, Compute, Net 2. Analysis engines • Hadoop ecosystem engines (MR, Hive, Kafka, Spark, HBase, Storm, Phoenix, ++) • Analytics Pipeline 3. Analysis in near real-time • Analytics timescale days/hours -> seconds • Batch -> Streaming 4Symantec CloudFire Analytics
  • 5. CPE Analytics Architecture Overview Inbound Messaging (Data import, Kafka) Products Distributed Storage (HDFS), Metal or Virtual (OpenStack) Servers Analytic Applications, Workload Management (YARN) Stream Processing (Storm, Spark) Real-time Results (HBase, ElasticSearch) Query (Hive, Spark SQL) Device Agents Telemetry, Data Threats: Top Web-based Attacks 192 web-based attacks recorded last month 36% 30% 14% 10% 7%3% Malvertisement Exploit Kit Suspicious Download Analytics Clusters • All major open source engines • PB-scale Data Store • Key Telemetry • Multi-Data Center • Administration • Monitoring • Prod, Dev/Test • Application Deployments CloudFire Analytics Data Transfer 5
  • 6. Hadoop Summit 2015 A key security Product Symantec CloudFire Analytics 6 • Live with hundreds of external customers • Improved time to analyzed results from hours to seconds (5000x) at production scale • Leverage Streaming Analytics • Move analytics to more powerful, high performance open source technologies – Kafka for queuing – Storm for analytics • Improve analytics – Server reputation – URL reputation • Moving forward – Leverage Analytics Pipeline – NoSQL DBs – Graph DBs HDFSMonitors
  • 7. Hadoop Summit 2015 Analytics Services and Data Symantec CloudFire Analytics 7
  • 8. Hadoop Summit 2015 Product Journey with CloudFire Analytics Symantec CloudFire Analytics 8 SDAP current • Scale? • Availability? • Usage? • Hours/Days timeframe Batch Analytics •Data Scale: Multi-PB HDFS •Analytics scale: Hadoop MR, Hive •Availability – high •Usage prioritization •Minutes/Hours timeframe Streaming Analytics •Analytics Scale: Kafka, Storm, Spark, ElasticSearch, HBase, ++ •Availability – high •Usage prioritization •Seconds timeframe Application style • Bring in data • Analyze • Architecture: Serial->Parallel • Workflow oriented Application style • Bring in data in zip files • Unzip, load, SQL • Architecture: Serial • Application oriented Application style • Stream in data directly • Analytics pipeline • Architecture: Streaming Parallel • Streaming oriented
  • 9. CloudFire Analytics Pipeline Data Sources Streaming Analysis Batch Analysis Actively Available Results Online Access Research Telemetry Clients World Cloud Data Assemble Analytic Applications Analytic Pipeline Message Queue 9 Storm, Kafka, Spark Hive, Spark HBase, HDFS, Kafka, Cassandra HBase, Index, Cache
  • 10. Hadoop Summit 2015 CloudFire Analytics Services Symantec CloudFire Analytics 10 Analytic Services and Data Transfer • Hadoop, YARN - Compute • HDFS/Knox/HttpFS – File Storage • Hive, Pig – SQL Batch • Oozie – Workflow and Scheduling • Service Endpoints – Admin / BDSE • Storm – Streaming • Kafka - Messaging • HBase – BigTable column store • Spark (SQL,ML) – Stream analytics • Research and admin CLI VMs • MTS Parallel file Import / Export • Falcon data management • ElasticSearch indexing • Cassandra, Graph, Drill, Solr Web UIs Admin Tools • Ambari - Admin • OpsView (Zabbix) – Uptime monitor • YARN and HDFS UIs – Perf & Logs • Ganglia & Nagios – Performance • Storm & Kafka lag – Perf/Problems • LMM – Problem determination & metrics • Resource Manager, Name Node UIs • Network data transfer monitoring • Puma performance usage analysis • Continuous Validation • Ranger user management Data Science Tools • HUE – SQL, Jobs, Files, Workflow • Ambari Views
  • 11. Hadoop Summit 2015 Analytics Administration and Monitoring Symantec CloudFire Analytics 11
  • 12. Hadoop Summit 2015 Ambari Analytics Administration 12CPE CloudFire Analytics Extensible administration platform for automated deployment, monitoring and alerting, configuration management, rolling upgrades, and rolling restarts.
  • 13. Hadoop Summit 2015 Ambari + ElasticSearch – Ambari Stack extensibility Symantec CloudFire Analytics 13 Ambari custom service for ElasticSearch, a new service to enable Ambari to manage. Ambari deploy, start/stop, config, monitor Uses the new Ambari Views, integrated via iFrame Elastic HQ admin UI as our use case. Keeps the dashboard clean, yet extensible. On github to share…
  • 14. Hadoop Summit 2015 Ambari Views with iFrame – Ambari extensibility Symantec CloudFire Analytics 14
  • 15. Hadoop Summit 2015 Cluster Usage Analysis using Analytics Symantec CloudFire Analytics 15
  • 16. Hadoop Summit 2015 Ambari Metrics Rapid Metrics Dashboards Construction - Reliability Symantec CloudFire Analytics 16 Prototype for building out flexible dashboards of most interesting application and platform metrics. Alerting available via the LMM service.
  • 17. Hadoop Summit 2015 Ambari Metrics Rapid Metrics Dashboards Construction - Availability Symantec CloudFire Analytics 17
  • 18. Hadoop Summit 2015 OpsView / Nagios monitor and alert service Symantec CloudFire Analytics 18
  • 19. Hadoop Summit 2015 Data Transfer Network Monitor Symantec CloudFire Analytics 19
  • 20. Hadoop Summit 2015 Data Transfer Rate Monitor Symantec CloudFire Analytics 20
  • 21. Hadoop Summit 2015 Self-Service Analytics and Dynamic Clusters CloudBreak integration and Development In Progress Symantec CloudFire Analytics 21
  • 22. Hadoop Summit 2015 Self-Service Analytics and Dynamic Clusters CloudBreak integration and Development In Progress Symantec CloudFire Analytics 22 Purpose ● Create and connect Analytics Clusters: ● Authoritative Data Lake, Regional Hubs ● Development, Test, Integration, Staging, and Production ● Ability for CloudFire users to spin up new clusters to develop their applications. Feature ● One-click spin up of Analytics clusters using OpenStack VMs ● VMs that developers spin up are part of their own OpenStack quota ● Developers will have admin access to the cluster they spin up, so they can install more services if needed. Automation ● We control the version of the software by using the same automation code that we use for deploying our production clusters.
  • 23. Hadoop Summit 2015 Creating and managing clusters Symantec CloudFire Analytics 23 Cluster Directory
  • 24. Hadoop Summit 2015 Self-Service Analytics Architecture Symantec CloudFire Analytics 24 Managed Dynamic Clusters Cross-Cluster Capabilities Cluster Management Cluster 1 Cluster Directory Continuous Validation & Monitoring Cluster 2 Cluster ... Cluster N Data Catalog Analytics Tools Each Cluster: • Analytic Services • Deployment • Networking • Monitoring integration • OpenStack VMs + Containers • Deployed Applications Assemble Analytic Applications Analytic Pipeline Message Queue