SlideShare a Scribd company logo
Stream Processing
Revolutionizing Big Data
Srikanth Satya
April 2018
pravega.io
Data-Intensive Apps Need Disruptive Technologies
The Unbundled Database vision sounds awesome!
 Loosely coupled data derivations and transformations
 Update derived state by observing data changes
 Observe changes in derived state – all the way to the edge
 Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
 Shared, durable, consistent, unbound distributed log storage
 Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
 Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data
pravega.io
The Unbundled Database vision sounds awesome!
 Loosely coupled data derivations and transformations
 Update derived state by observing data changes
 Observe changes in derived state – all the way to the edge
 Integrity and correctness: end-to-end IDs, idempotence, data
consistency and exactly once semantics
BUT realizing it requires disruptive systems capabilities
 Shared, durable, consistent, unbound distributed log storage
 Ability to dynamically scale both the log(s) and downstream
processors in coordination with data arrival volume
 Ability to deliver timely and accurate results processing the log
continuously even with late arriving or out of order data
Data-Intensive Apps Are Disruptive
We passionately believe in these principles.
As the industry leaders in storage, we’re
developing a new, open storage primitive
enabling all of us to realize the full potential of
this powerful vision.
pravega.io
Introducing Pravega Stream Storage
pravega.io
Introducing Pravega Stream Storage
A new storage abstraction – a stream – for continuous and infinite data
 Named, durable, append-only, infinite sequence of bytes
 With low-latency appends to and reads from the tail of the sequence
 With high-throughput reads for older portions of the sequence
Coordinated scaling of stream storage and stream processing
 Stream writes partitioned by app-defined routing key
 Stream reads independently and automatically partitioned by arrival rate SLO
 Scaling protocol to allow stream processors to scale in lockstep with storage
Enabling system-wide exactly once processing across multiple apps
 Streams are ordered and strongly consistent
 Chain independent streaming apps via streams
 Stream transactions integrate with checkpoint schemes such as the one used in Flink
pravega.io
Revisiting the Disruptive Capabilities
Required Systems Capabilities
 Shared, durable, consistent,
unbound distributed log storage
 Dynamically scale logs in
coordination with downstream
processors
 Deliver accurate results processing
continuously even with late arriving
or out of order data
Enabling Pravega Features
 Durable, append-only byte streams
 Consistent tail and replay reads
 Unlimited retention, storage efficiency
 Auto-scaling
 Independently scale readers/writers
 Transactions and exactly once
 Event time and processing time
pravega.io
The Streaming Revolution
Enabling continuous pipelines w/ consistent replay, composability, elasticity, exactly once
Ingest Buffer
& Pub/Sub
Streaming
Search
Streaming
Analytics
Cloud-Scale Storage
Pravega Stream Store
State
Synchronizer
pravega.io
Pravega for Ingest Buffer and Pub/Sub
Ingest Buffer, Distributed Ledger or Messaging
using Pravega Event Client
Stream
01110110
01101001
Consumer
s
Reader
Groups
Consumer
s
Writers
pravega.io
Pravega for Application State Synchronization
Distributed State via State Synchronizer Client
“Shared State” Stream
01110110
01101001
App Process #1
State Synchronizer
Stream Client
App Process #n
State Synchronizer
Stream Client
• Shared Properties
• Shared scalar data
• Shared K/V data
pravega.io
Pravega + Flink = Pure Streaming End-to-End
Dynamically Scale Storage + Compute Based On Data Arrival Volume
Protocol coordination between
streaming storage and streaming
engine to systematically scale up
and down the number of segments
and Flink workers based on load
variance over time
Utilize transactional writes to extend Exactly Once
processing semantics across multiple, chained apps
Writers scale based on app configuration; stream
storage elastically and independently scales
based on aggregate incoming volume of data
Streaming
App
“Raw
Stream” … …
Social,IoT,…
Writers
“Cooked
Stream”
2nd
Streaming
App
Sink
Worker
Worker
WorkerSegment
Segment
Segment
Sink ……
Worker
WorkerSegment
Segment
pravega.io
Search Reimagined for a Streaming World
Advantages of This Approach
• Seamlessly integrate search into streaming pipelines: continuous indexing + continuous query
• Dynamically scale search based on data volume arrival rate and query SLA
• Eliminate redundant storage across input streams and search
Input Streams
Pravega
Search
Continuous
Indexing
Continuous
Query
Result Streams
… stream pipeline …
… stream pipeline …
pravega.io
 Pravega: an open source project with an open community
 Software includes infinite byte stream primitive, event abstraction, ingest
buffer, and pub/sub services
 Flink integration for scale, elasticity, and system-wide exactly once
 Join the community at pravega.io

More Related Content

PDF
dA Platform Overview
Robert Metzger
 
PPTX
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
PPTX
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 
PPTX
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
 
PDF
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
PDF
Scaling stream data pipelines with Pravega and Apache Flink
Till Rohrmann
 
PPTX
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Flink Forward
 
PDF
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward
 
dA Platform Overview
Robert Metzger
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward
 
Scaling stream data pipelines with Pravega and Apache Flink
Till Rohrmann
 
Flink Forward San Francisco 2018 keynote: Stephan Ewen - "What turns stream p...
Flink Forward
 
Flink Forward San Francisco 2018: Xu Yang - "Alibaba’s common algorithm platf...
Flink Forward
 

What's hot (20)

PDF
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
PDF
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
PDF
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
PDF
Tuning Flink For Robustness And Performance
Stefan Richter
 
PDF
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Flink Forward
 
PDF
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
HostedbyConfluent
 
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
PDF
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
PDF
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward
 
PDF
What's New in Confluent Platform 5.5
confluent
 
PDF
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
PPTX
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
PPTX
Apache flink 1.7 and Beyond
Till Rohrmann
 
PDF
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
PDF
Maximilian Michels - Flink and Beam
Flink Forward
 
PDF
Matching the Scale at Tinder with Kafka
confluent
 
PPTX
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
Flink Forward Berlin 2018: Viktor Klang - Keynote "The convergence of stream ...
Flink Forward
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Tuning Flink For Robustness And Performance
Stefan Richter
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Flink Forward
 
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...
HostedbyConfluent
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
Flink Forward San Francisco 2018: Jörg Schad and Biswajit Das - "Operating Fl...
Flink Forward
 
What's New in Confluent Platform 5.5
confluent
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
Flink Forward Berlin 2018: Aljoscha Krettek & Till Rohrmann - Keynote: "A Yea...
Flink Forward
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
Apache flink 1.7 and Beyond
Till Rohrmann
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
Maximilian Michels - Flink and Beam
Flink Forward
 
Matching the Scale at Tinder with Kafka
confluent
 
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
Ad

Similar to Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data" (20)

PDF
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
PPTX
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
PDF
Streaming architecture patterns
hadooparchbook
 
PDF
Streaming analytics state of the art
Stavros Kontopoulos
 
PDF
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Flink Forward
 
PDF
An elastic batch-and stream-processing stack with Pravega and Apache Flink
DataWorks Summit
 
PPTX
Flink Forward Berlin 2017: Stephan Ewen, Flavio Junqueira - Connecting Apache...
Flink Forward
 
PPTX
Your Guide to Streaming - The Engineer's Perspective
Ilya Ganelin
 
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
PPTX
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
PPTX
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
PPTX
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
PPTX
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
PDF
It's Time To Stop Using Lambda Architecture
Yaroslav Tkachenko
 
PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PPTX
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
PPTX
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
PPTX
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
PDF
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward
 
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Streaming architecture patterns
hadooparchbook
 
Streaming analytics state of the art
Stavros Kontopoulos
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Flink Forward
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
DataWorks Summit
 
Flink Forward Berlin 2017: Stephan Ewen, Flavio Junqueira - Connecting Apache...
Flink Forward
 
Your Guide to Streaming - The Engineer's Perspective
Ilya Ganelin
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
Architectual Comparison of Apache Apex and Spark Streaming
Apache Apex
 
Apache Apex: Stream Processing Architecture and Applications
Comsysto Reply GmbH
 
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise
 
It's Time To Stop Using Lambda Architecture
Yaroslav Tkachenko
 
Building Big Data Streaming Architectures
David Martínez Rego
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Next Gen Big Data Analytics with Apache Apex
DataWorks Summit/Hadoop Summit
 
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward
 
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

Recently uploaded (20)

PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Coupa-Overview _Assumptions presentation
annapureddyn
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Coupa-Overview _Assumptions presentation
annapureddyn
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processing Revolutionizing Big Data"

  • 1. Stream Processing Revolutionizing Big Data Srikanth Satya April 2018
  • 2. pravega.io Data-Intensive Apps Need Disruptive Technologies The Unbundled Database vision sounds awesome!  Loosely coupled data derivations and transformations  Update derived state by observing data changes  Observe changes in derived state – all the way to the edge  Integrity and correctness: end-to-end IDs, idempotence, data consistency and exactly once semantics BUT realizing it requires disruptive systems capabilities  Shared, durable, consistent, unbound distributed log storage  Ability to dynamically scale both the log(s) and downstream processors in coordination with data arrival volume  Ability to deliver timely and accurate results processing the log continuously even with late arriving or out of order data
  • 3. pravega.io The Unbundled Database vision sounds awesome!  Loosely coupled data derivations and transformations  Update derived state by observing data changes  Observe changes in derived state – all the way to the edge  Integrity and correctness: end-to-end IDs, idempotence, data consistency and exactly once semantics BUT realizing it requires disruptive systems capabilities  Shared, durable, consistent, unbound distributed log storage  Ability to dynamically scale both the log(s) and downstream processors in coordination with data arrival volume  Ability to deliver timely and accurate results processing the log continuously even with late arriving or out of order data Data-Intensive Apps Are Disruptive We passionately believe in these principles. As the industry leaders in storage, we’re developing a new, open storage primitive enabling all of us to realize the full potential of this powerful vision.
  • 5. pravega.io Introducing Pravega Stream Storage A new storage abstraction – a stream – for continuous and infinite data  Named, durable, append-only, infinite sequence of bytes  With low-latency appends to and reads from the tail of the sequence  With high-throughput reads for older portions of the sequence Coordinated scaling of stream storage and stream processing  Stream writes partitioned by app-defined routing key  Stream reads independently and automatically partitioned by arrival rate SLO  Scaling protocol to allow stream processors to scale in lockstep with storage Enabling system-wide exactly once processing across multiple apps  Streams are ordered and strongly consistent  Chain independent streaming apps via streams  Stream transactions integrate with checkpoint schemes such as the one used in Flink
  • 6. pravega.io Revisiting the Disruptive Capabilities Required Systems Capabilities  Shared, durable, consistent, unbound distributed log storage  Dynamically scale logs in coordination with downstream processors  Deliver accurate results processing continuously even with late arriving or out of order data Enabling Pravega Features  Durable, append-only byte streams  Consistent tail and replay reads  Unlimited retention, storage efficiency  Auto-scaling  Independently scale readers/writers  Transactions and exactly once  Event time and processing time
  • 7. pravega.io The Streaming Revolution Enabling continuous pipelines w/ consistent replay, composability, elasticity, exactly once Ingest Buffer & Pub/Sub Streaming Search Streaming Analytics Cloud-Scale Storage Pravega Stream Store State Synchronizer
  • 8. pravega.io Pravega for Ingest Buffer and Pub/Sub Ingest Buffer, Distributed Ledger or Messaging using Pravega Event Client Stream 01110110 01101001 Consumer s Reader Groups Consumer s Writers
  • 9. pravega.io Pravega for Application State Synchronization Distributed State via State Synchronizer Client “Shared State” Stream 01110110 01101001 App Process #1 State Synchronizer Stream Client App Process #n State Synchronizer Stream Client • Shared Properties • Shared scalar data • Shared K/V data
  • 10. pravega.io Pravega + Flink = Pure Streaming End-to-End Dynamically Scale Storage + Compute Based On Data Arrival Volume Protocol coordination between streaming storage and streaming engine to systematically scale up and down the number of segments and Flink workers based on load variance over time Utilize transactional writes to extend Exactly Once processing semantics across multiple, chained apps Writers scale based on app configuration; stream storage elastically and independently scales based on aggregate incoming volume of data Streaming App “Raw Stream” … … Social,IoT,… Writers “Cooked Stream” 2nd Streaming App Sink Worker Worker WorkerSegment Segment Segment Sink …… Worker WorkerSegment Segment
  • 11. pravega.io Search Reimagined for a Streaming World Advantages of This Approach • Seamlessly integrate search into streaming pipelines: continuous indexing + continuous query • Dynamically scale search based on data volume arrival rate and query SLA • Eliminate redundant storage across input streams and search Input Streams Pravega Search Continuous Indexing Continuous Query Result Streams … stream pipeline … … stream pipeline …
  • 12. pravega.io  Pravega: an open source project with an open community  Software includes infinite byte stream primitive, event abstraction, ingest buffer, and pub/sub services  Flink integration for scale, elasticity, and system-wide exactly once  Join the community at pravega.io