SlideShare a Scribd company logo
1 © 2019 PURE STORAGE INC. PURE PROPRIETARY
How to Avoid Drowning in Logs
Joshua Robinson
Founding Engineer, FlashBlade
Streaming 190 Billion Events/Day and
Batching 150 TB/Hour
2 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analytics Pipeline in Numbers
2M events / second
5 seconds SLA
0.5 - 1 PB of data / day
3 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Continuous Integration &
Continuous Deployment
Source Build
Functional
Test
Stress
Test
Deploy
4 © 2019 PURE STORAGE INC. PURE PROPRIETARY
< 5
1 Test
coordinator
(Jenkins)
< 10
< 10
CI/CD works!
100s
tests / day
< 5
failures
Email
developer
5 © 2019 PURE STORAGE INC. PURE PROPRIETARY
700
failures
x
15 min
70,000+
tests / day
20 Triage Engineers
2x in the next 12 months
1500+
VMs
250+
FBs
20+
Jenkins
700+
clients
100+
Engineers
Scale Problems
6 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Dream
1. Automate triaging of failures
2. Extract performance metrics
3. Save our logs for future use
4. Do all of this in a scalable system
5. Real-time results!
7 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis
Volume
Value
8 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v1
Volume
Value
Save
Alert / Take action
9 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v2
Volume
Value
Save
ETL / Add Structure
Alert / Take action
10 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v3
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
11 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v10
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
12 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Index
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
13 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
Index
14 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
15 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
16 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
Re-Filter
17 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
18 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Streaming
Buffer
Filter
Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
19 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Filter
Re-Filter
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
20 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
21 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
22 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Visualize
23 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
24 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
25 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Indexing
Use filesystem directory structure to encode metadata
• Raw data: <host>/<year>/<month>/<day>/<flat files>
• Producer: Rsyslog
• Consumer: Spark batch (re-filter or custom lookbacks)
• Indexed data: <pattern>/<year>/<month>/<day>/<hour>/<host>/<flat files>
• Producer: Spark streaming (filter)
• Consumer: Python services (e.g. ETL, alert, searchability)
26 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Querying
Find and load data
• FlashBlade NFS protocol. < 1ms latency
• Listing
• “ls -alR” is still SLOW
• NFS client in kernel sequentially discovers filesystem structure.
• Solution: Skip the kernel. Use libnfs to create our own parallelized discovery. 1000x faster for 1M
files
• Reading
• Buffering: Create input pipeline to optimize for throughput and hide latency away
27 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
300+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G 12
12
12
12
12
12
 Duplicate bug
 Infrastructure failure
 Performance regression
28 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
350+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G
12
12
12
12
12
12
 Duplicate bug
 Infrastructure failure
 Performance regression
200
T
12
12
12
12
12
12
90G
29 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
350+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G
12
12
12
12
12
12
 Duplicate bug
 Infrastructure failure
 Performance regression
200
T
12
12
12
12
12
12
90G
50G
12
12
12
12
189
T  Low level details
 Easy to read graphs
30 © 2019 PURE STORAGE INC. PURE PROPRIETARY
Takeaways
 Index only what you need, store the rest
(in a storage layer that scales in throughput and to billions of files/objects)
 Optimize for throughput and not latency
 Disaggregation of compute and storage for
scalability of subsystems
31 © 2019 PURE STORAGE INC. PURE PROPRIETARY
QUESTIONS?

More Related Content

What's hot (20)

PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PDF
Hadoop: The Unintended Benefits
DataWorks Summit
 
PPTX
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
DataWorks Summit
 
PPTX
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
PPTX
Lessons learned running a container cloud on YARN
DataWorks Summit
 
PPTX
Accelerating Big Data Insights
DataWorks Summit
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Insights into Real-world Data Management Challenges
DataWorks Summit
 
PDF
Ingesting Data at Blazing Speed Using Apache Orc
DataWorks Summit
 
PDF
Multitenancy At Bloomberg - HBase and Oozie
DataWorks Summit
 
PPTX
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
 
PPTX
What's new in apache hive
DataWorks Summit
 
PPTX
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
PDF
Data in the Cloud Crash Course
DataWorks Summit
 
PDF
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
PPTX
Migrating Analytics to the Cloud at Fannie Mae
DataWorks Summit
 
PDF
Fast SQL on Hadoop, really?
DataWorks Summit
 
PPTX
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
 
PPTX
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
PPTX
What’s new in Apache Spark 2.3
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Hadoop: The Unintended Benefits
DataWorks Summit
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
DataWorks Summit
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
Lessons learned running a container cloud on YARN
DataWorks Summit
 
Accelerating Big Data Insights
DataWorks Summit
 
Saving the elephant—now, not later
DataWorks Summit
 
Insights into Real-world Data Management Challenges
DataWorks Summit
 
Ingesting Data at Blazing Speed Using Apache Orc
DataWorks Summit
 
Multitenancy At Bloomberg - HBase and Oozie
DataWorks Summit
 
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
 
What's new in apache hive
DataWorks Summit
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
Data in the Cloud Crash Course
DataWorks Summit
 
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
Migrating Analytics to the Cloud at Fannie Mae
DataWorks Summit
 
Fast SQL on Hadoop, really?
DataWorks Summit
 
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
What’s new in Apache Spark 2.3
DataWorks Summit
 

Similar to Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Events Daily (20)

PDF
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Databricks
 
PDF
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Databricks
 
PDF
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
PDF
Storage for big-data by Joshua Robinson
Data Con LA
 
PDF
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summits
 
PDF
360-Degree View of IT Infrastructure with IT Operations Analytics
Precisely
 
PDF
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld
 
PDF
The burden of a successful feature: Scaling our real time logging platform
Fastly
 
PPTX
ParStream - Big Data for Business Users
ParStream Inc.
 
PPTX
SplunkLive! Milano 2016 - customer presentation - Unicredit
Splunk
 
PPTX
How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
PPTX
dlux - Splunk Technical Overview
David Lutz
 
PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
PPTX
Flume & FluentD (ETL Comparison)
David V.P.
 
PPTX
Log Data Analysis Platform
Valentin Kropov
 
PPTX
Log Data Analysis Platform by Valentin Kropov
SoftServe
 
PDF
Accelerate Return on Data
Jeffrey T. Pollock
 
PPTX
SplunkLive! Dallas Nov 2012 - Metro PCS
Splunk
 
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events an...
Databricks
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Databricks
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Storage for big-data by Joshua Robinson
Data Con LA
 
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summits
 
360-Degree View of IT Infrastructure with IT Operations Analytics
Precisely
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld
 
The burden of a successful feature: Scaling our real time logging platform
Fastly
 
ParStream - Big Data for Business Users
ParStream Inc.
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
Splunk
 
How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
dlux - Splunk Technical Overview
David Lutz
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
Flume & FluentD (ETL Comparison)
David V.P.
 
Log Data Analysis Platform
Valentin Kropov
 
Log Data Analysis Platform by Valentin Kropov
SoftServe
 
Accelerate Return on Data
Jeffrey T. Pollock
 
SplunkLive! Dallas Nov 2012 - Metro PCS
Splunk
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Designing Production-Ready AI Agents
Kunal Rai
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
July Patch Tuesday
Ivanti
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 

Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Events Daily

  • 1. 1 © 2019 PURE STORAGE INC. PURE PROPRIETARY How to Avoid Drowning in Logs Joshua Robinson Founding Engineer, FlashBlade Streaming 190 Billion Events/Day and Batching 150 TB/Hour
  • 2. 2 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analytics Pipeline in Numbers 2M events / second 5 seconds SLA 0.5 - 1 PB of data / day
  • 3. 3 © 2019 PURE STORAGE INC. PURE PROPRIETARY Continuous Integration & Continuous Deployment Source Build Functional Test Stress Test Deploy
  • 4. 4 © 2019 PURE STORAGE INC. PURE PROPRIETARY < 5 1 Test coordinator (Jenkins) < 10 < 10 CI/CD works! 100s tests / day < 5 failures Email developer
  • 5. 5 © 2019 PURE STORAGE INC. PURE PROPRIETARY 700 failures x 15 min 70,000+ tests / day 20 Triage Engineers 2x in the next 12 months 1500+ VMs 250+ FBs 20+ Jenkins 700+ clients 100+ Engineers Scale Problems
  • 6. 6 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Dream 1. Automate triaging of failures 2. Extract performance metrics 3. Save our logs for future use 4. Do all of this in a scalable system 5. Real-time results!
  • 7. 7 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Volume Value
  • 8. 8 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v1 Volume Value Save Alert / Take action
  • 9. 9 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v2 Volume Value Save ETL / Add Structure Alert / Take action
  • 10. 10 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v3 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 11. 11 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v10 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 12. 12 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Index Aggregate Transform Logic Timeseries DB AlertStore Visualize
  • 13. 13 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Aggregate Transform Logic Timeseries DB AlertStore Visualize Index
  • 14. 14 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 15. 15 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 16. 16 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index Re-Filter
  • 17. 17 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 18. 18 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Streaming Buffer Filter Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 19. 19 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Filter Re-Filter Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 20. 20 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 21. 21 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize
  • 22. 22 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Visualize
  • 23. 23 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources
  • 24. 24 © 2019 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources
  • 25. 25 © 2019 PURE STORAGE INC. PURE PROPRIETARY Indexing Use filesystem directory structure to encode metadata • Raw data: <host>/<year>/<month>/<day>/<flat files> • Producer: Rsyslog • Consumer: Spark batch (re-filter or custom lookbacks) • Indexed data: <pattern>/<year>/<month>/<day>/<hour>/<host>/<flat files> • Producer: Spark streaming (filter) • Consumer: Python services (e.g. ETL, alert, searchability)
  • 26. 26 © 2019 PURE STORAGE INC. PURE PROPRIETARY Querying Find and load data • FlashBlade NFS protocol. < 1ms latency • Listing • “ls -alR” is still SLOW • NFS client in kernel sequentially discovers filesystem structure. • Solution: Skip the kernel. Use libnfs to create our own parallelized discovery. 1000x faster for 1M files • Reading • Buffering: Create input pipeline to optimize for throughput and hide latency away
  • 27. 27 © 2019 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 300+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12  Duplicate bug  Infrastructure failure  Performance regression
  • 28. 28 © 2019 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 350+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12  Duplicate bug  Infrastructure failure  Performance regression 200 T 12 12 12 12 12 12 90G
  • 29. 29 © 2019 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 350+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12  Duplicate bug  Infrastructure failure  Performance regression 200 T 12 12 12 12 12 12 90G 50G 12 12 12 12 189 T  Low level details  Easy to read graphs
  • 30. 30 © 2019 PURE STORAGE INC. PURE PROPRIETARY Takeaways  Index only what you need, store the rest (in a storage layer that scales in throughput and to billions of files/objects)  Optimize for throughput and not latency  Disaggregation of compute and storage for scalability of subsystems
  • 31. 31 © 2019 PURE STORAGE INC. PURE PROPRIETARY QUESTIONS?