SlideShare a Scribd company logo
1 © 2018 PURE STORAGE INC. PURE PROPRIETARY
How to Avoid Drowning in Logs
Joshua Robinson
Founding Engineer, FlashBlade
Streaming 180 Billion Events/Day and
Batching 150 TB/Hour
2 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analytics Pipeline in Numbers
ü2M events / second
ü5 seconds SLA
ü0.5 - 1 PB of data / day
3 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Continuous Integration &
Continuous Deployment
Source Build
Functional
Test
Stress
Test
Deploy
4 © 2018 PURE STORAGE INC. PURE PROPRIETARY
< 5
1 Test
coordinator
(Jenkins)
< 10
< 10
CI/CD works!
100s
tests / day
< 5
failures
Email
developer
5 © 2018 PURE STORAGE INC. PURE PROPRIETARY
700
failures
x
15 min
70,000+
tests / day
20 Triage Engineers
2x in the next 12 months
1500+
VMs
250+
FBs
20+
Jenkins
700+
clients
100+
Engineers
Scale Problems
6 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Dream
1. Automate triaging of failures
2. Extract performance metrics
3. Save our logs for future use
4. Do all of this in a scalable system
5. Real-time results!
7 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis
Volume
Value
8 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v1
Volume
Value
Save
Alert / Take action
9 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v2
Volume
Value
Save
ETL / Add Structure
Alert / Take action
10 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v3
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
11 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis v10
Volume
Value
Save
Aggregate / Search
ETL / Add Structure
Alert / Take action
12 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Index
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
13 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Aggregate
Transform
Logic
Timeseries
DB
AlertStore
Visualize
Index
14 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
15 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
16 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
Augment &
Centralize
LogSources
Streaming
Buffer
Filter
Store
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
Re-Filter
17 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Streaming
Buffer
Filter
Store Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
18 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Streaming
Buffer
Filter
Re-Filter
Aggregate
Transform
Logic
Timeseries
DB
Alert
Visualize
Index
19 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Filter
Re-Filter
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
20 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
Index
21 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Aggregate
Transform
Logic
Visualize
22 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
Timeseries
DB
Alert
Visualize
23 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
24 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Log Analysis Pipeline
rsyslog
LogSources
25 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Indexing
Use filesystem directory structure to encode metadata
• Raw data: <host>/<year>/<month>/<day>/<flat files>
• Producer: Rsyslog
• Consumer: Spark batch (re-filter or custom lookbacks)
• Indexed data: <pattern>/<year>/<month>/<day>/<hour>/<host>/<flat files>
• Producer: Spark streaming (filter)
• Consumer: Python services (e.g. ETL, alert, searchability)
26 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Querying
Find and load data
• FlashBlade NFS protocol. < 1ms latency
• Listing
• “ls -alR” is still SLOW
• NFS client in kernel sequentially discovers filesystem structure.
• Solution: Skip the kernel. Use libnfs to create our own parallelized discovery. 1000x faster for 1M
files
• Reading
• Buffering: Create input pipeline to optimize for throughput and hide latency away
27 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
300+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G 12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression
28 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
350+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G
12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression200T
12
12
12
12
12
12
90G
29 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Full Pipeline
2,500+
VMs
350+
FBs
20+
Jenkins
1,000+
clients
72T
12
12
12
12
12
12
12
12
12
12
72T 12
12
12
12
12
12
12
12
12
12
12
12
12
12
120,000+
tests / day
24T
rsyslog
16
16
16
16
16
16
800G
12
12
12
12
12
12
ü Duplicate bug
ü Infrastructure failure
ü Performance regression200T
12
12
12
12
12
12
90G
50G
12
12
12
12189T ü Low level details
ü Easy to read graphs
30 © 2018 PURE STORAGE INC. PURE PROPRIETARY
Takeaways
ü Index only what you need, store the rest
(in a storage layer that scales in throughput and to billions of files/objects)
ü Optimize for throughput and not latency
ü Disaggregation of compute and storage for
scalability of subsystems
31 © 2018 PURE STORAGE INC. PURE PROPRIETARY
QUESTIONS?

More Related Content

What's hot (20)

DOCX
Rakesh Chander Oracle
Rakesh Chander
 
PPTX
Accelerating query processing with materialized views in Apache Hive
DataWorks Summit
 
PDF
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
PPTX
Dealing with Drift: Building an Enterprise Data Lake
Pat Patterson
 
PPTX
Omid: scalable and highly available transaction processing for Apache Phoenix
DataWorks Summit
 
PPTX
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Data Con LA
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
Data Ingest Self Service and Management using Nifi and Kafka
DataWorks Summit
 
PPTX
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
PDF
Multitenancy At Bloomberg - HBase and Oozie
DataWorks Summit
 
PPTX
Large-scaled telematics analytics
DataWorks Summit
 
PPTX
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 
PPTX
Solving Performance Problems on Hadoop
Tyler Mitchell
 
PPTX
The Future of Apache Ambari
DataWorks Summit
 
PPTX
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
PPTX
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
PPTX
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
PPTX
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
PDF
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 
Rakesh Chander Oracle
Rakesh Chander
 
Accelerating query processing with materialized views in Apache Hive
DataWorks Summit
 
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Dealing with Drift: Building an Enterprise Data Lake
Pat Patterson
 
Omid: scalable and highly available transaction processing for Apache Phoenix
DataWorks Summit
 
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Data Con LA
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
Data Ingest Self Service and Management using Nifi and Kafka
DataWorks Summit
 
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
Multitenancy At Bloomberg - HBase and Oozie
DataWorks Summit
 
Large-scaled telematics analytics
DataWorks Summit
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA
 
Solving Performance Problems on Hadoop
Tyler Mitchell
 
The Future of Apache Ambari
DataWorks Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
The State of the Data Warehouse in 2017 and Beyond
SingleStore
 

Similar to Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events and Batch Processing 40 TB/Hour with Joshua Robinson (20)

PDF
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
DataWorks Summit
 
PDF
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Databricks
 
PDF
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
PDF
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
PDF
Storage for big-data by Joshua Robinson
Data Con LA
 
PDF
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summits
 
PDF
The burden of a successful feature: Scaling our real time logging platform
Fastly
 
PDF
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld
 
PDF
360-Degree View of IT Infrastructure with IT Operations Analytics
Precisely
 
PPTX
How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
PPT
Feeding the Elephant: Approaching 1PB/Day
DataWorks Summit
 
PPTX
Log Data Analysis Platform
Valentin Kropov
 
PPTX
Log Data Analysis Platform by Valentin Kropov
SoftServe
 
PDF
Mining Your Logs - Gaining Insight Through Visualization
Raffael Marty
 
PPTX
dlux - Splunk Technical Overview
David Lutz
 
PDF
Hadoop, hive和scribe在运维方面的应用
xshadowxc
 
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
PDF
SplunkApplicationLoggingBestPractices_Template_2.3.pdf
TuynNguyn819213
 
PPTX
SplunkLive! Dallas Nov 2012 - Metro PCS
Splunk
 
PDF
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 
Avoiding Log Data Overload in a CI/CD System While Streaming 190 Billion Even...
DataWorks Summit
 
Efficiently Triaging CI Pipelines with Apache Spark: Mixing 52 Billion Events...
Databricks
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
Spark + Flashblade: Spark Summit East talk by Brian Gold
Spark Summit
 
Storage for big-data by Joshua Robinson
Data Con LA
 
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summits
 
The burden of a successful feature: Scaling our real time logging platform
Fastly
 
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
VMworld
 
360-Degree View of IT Infrastructure with IT Operations Analytics
Precisely
 
How logging makes a private cloud a better cloud - OpenStack最新情報セミナー(2016年12月)
VirtualTech Japan Inc.
 
Feeding the Elephant: Approaching 1PB/Day
DataWorks Summit
 
Log Data Analysis Platform
Valentin Kropov
 
Log Data Analysis Platform by Valentin Kropov
SoftServe
 
Mining Your Logs - Gaining Insight Through Visualization
Raffael Marty
 
dlux - Splunk Technical Overview
David Lutz
 
Hadoop, hive和scribe在运维方面的应用
xshadowxc
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
SplunkApplicationLoggingBestPractices_Template_2.3.pdf
TuynNguyn819213
 
SplunkLive! Dallas Nov 2012 - Metro PCS
Splunk
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Amazon Web Services Korea
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 

Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events and Batch Processing 40 TB/Hour with Joshua Robinson

  • 1. 1 © 2018 PURE STORAGE INC. PURE PROPRIETARY How to Avoid Drowning in Logs Joshua Robinson Founding Engineer, FlashBlade Streaming 180 Billion Events/Day and Batching 150 TB/Hour
  • 2. 2 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analytics Pipeline in Numbers ü2M events / second ü5 seconds SLA ü0.5 - 1 PB of data / day
  • 3. 3 © 2018 PURE STORAGE INC. PURE PROPRIETARY Continuous Integration & Continuous Deployment Source Build Functional Test Stress Test Deploy
  • 4. 4 © 2018 PURE STORAGE INC. PURE PROPRIETARY < 5 1 Test coordinator (Jenkins) < 10 < 10 CI/CD works! 100s tests / day < 5 failures Email developer
  • 5. 5 © 2018 PURE STORAGE INC. PURE PROPRIETARY 700 failures x 15 min 70,000+ tests / day 20 Triage Engineers 2x in the next 12 months 1500+ VMs 250+ FBs 20+ Jenkins 700+ clients 100+ Engineers Scale Problems
  • 6. 6 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Dream 1. Automate triaging of failures 2. Extract performance metrics 3. Save our logs for future use 4. Do all of this in a scalable system 5. Real-time results!
  • 7. 7 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Volume Value
  • 8. 8 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v1 Volume Value Save Alert / Take action
  • 9. 9 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v2 Volume Value Save ETL / Add Structure Alert / Take action
  • 10. 10 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v3 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 11. 11 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis v10 Volume Value Save Aggregate / Search ETL / Add Structure Alert / Take action
  • 12. 12 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Index Aggregate Transform Logic Timeseries DB AlertStore Visualize
  • 13. 13 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Aggregate Transform Logic Timeseries DB AlertStore Visualize Index
  • 14. 14 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 15. 15 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 16. 16 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline Augment & Centralize LogSources Streaming Buffer Filter Store Aggregate Transform Logic Timeseries DB Alert Visualize Index Re-Filter
  • 17. 17 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Streaming Buffer Filter Store Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 18. 18 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Streaming Buffer Filter Re-Filter Aggregate Transform Logic Timeseries DB Alert Visualize Index
  • 19. 19 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Filter Re-Filter Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 20. 20 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize Index
  • 21. 21 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Aggregate Transform Logic Visualize
  • 22. 22 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources Timeseries DB Alert Visualize
  • 23. 23 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources
  • 24. 24 © 2018 PURE STORAGE INC. PURE PROPRIETARY Log Analysis Pipeline rsyslog LogSources
  • 25. 25 © 2018 PURE STORAGE INC. PURE PROPRIETARY Indexing Use filesystem directory structure to encode metadata • Raw data: <host>/<year>/<month>/<day>/<flat files> • Producer: Rsyslog • Consumer: Spark batch (re-filter or custom lookbacks) • Indexed data: <pattern>/<year>/<month>/<day>/<hour>/<host>/<flat files> • Producer: Spark streaming (filter) • Consumer: Python services (e.g. ETL, alert, searchability)
  • 26. 26 © 2018 PURE STORAGE INC. PURE PROPRIETARY Querying Find and load data • FlashBlade NFS protocol. < 1ms latency • Listing • “ls -alR” is still SLOW • NFS client in kernel sequentially discovers filesystem structure. • Solution: Skip the kernel. Use libnfs to create our own parallelized discovery. 1000x faster for 1M files • Reading • Buffering: Create input pipeline to optimize for throughput and hide latency away
  • 27. 27 © 2018 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 300+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression
  • 28. 28 © 2018 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 350+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression200T 12 12 12 12 12 12 90G
  • 29. 29 © 2018 PURE STORAGE INC. PURE PROPRIETARY Full Pipeline 2,500+ VMs 350+ FBs 20+ Jenkins 1,000+ clients 72T 12 12 12 12 12 12 12 12 12 12 72T 12 12 12 12 12 12 12 12 12 12 12 12 12 12 120,000+ tests / day 24T rsyslog 16 16 16 16 16 16 800G 12 12 12 12 12 12 ü Duplicate bug ü Infrastructure failure ü Performance regression200T 12 12 12 12 12 12 90G 50G 12 12 12 12189T ü Low level details ü Easy to read graphs
  • 30. 30 © 2018 PURE STORAGE INC. PURE PROPRIETARY Takeaways ü Index only what you need, store the rest (in a storage layer that scales in throughput and to billions of files/objects) ü Optimize for throughput and not latency ü Disaggregation of compute and storage for scalability of subsystems
  • 31. 31 © 2018 PURE STORAGE INC. PURE PROPRIETARY QUESTIONS?