SlideShare a Scribd company logo
Evolution of Stream Processing Systems
Amit Sahu
Digital Specialist Engineer
Introduction
Importance of Stream Processing
• Stream processing has emerged as a critical technology for handling and analyzing high-velocity data streams.
• Traditional batch processing systems are not suitable for real-time data analysis and decision-making.
• Stream processing enables organizations to extract valuable insights from continuous data streams and make
timely and informed decisions.
Introduction
Challenges in Stream Processing
• Stream processing systems face several challenges in managing out-of-order data, handling stateful computations,
ensuring fault tolerance, managing high data loads, and supporting dynamic reconfiguration.
• Addressing these challenges is crucial for the successful implementation and adoption of stream processing
technology.
Characteristics of Data Streams and Continuous Queries
High-Volume, Real-Time Nature
• Data streams are continuous and high in volume,
often arriving at a rapid rate.
• Continuous queries are executed on these data
streams in real-time, requiring immediate
processing.
On-the-Fly Processing with Limited Memory
• Data streams and continuous queries need to be
processed on-the-fly, without the ability to store the
entire data set.
• Limited memory resources pose a challenge for
performing computations in real-time.
Continuously Producing Updated Results
• Continuous queries require continuously updated
results as new data arrives.
• The challenge lies in efficiently updating the query
results without reprocessing the entire data
stream.
Handling High Input Rates
• Data streams can have high input rates, making it
challenging to process the incoming data in a timely
manner.
• Efficient mechanisms are required to handle the high
input rates and avoid data overload.
Characteristics of Data Streams and Continuous Queries
Requirements for Streaming Systems
Correctness with Out-of-Order and Delayed Data
• Streaming systems need to handle data that arrives out of order or with delays, ensuring that the processing results
are correct and consistent.
Progress Estimation and Result Completeness
• Streaming systems should provide mechanisms to estimate the progress of data processing and ensure that the
results are complete and accurate.
Management of Accumulated State
• Streaming systems need to efficiently manage and maintain the state of ongoing computations, such as
aggregations and windowing operations.
Requirements for Streaming Systems
Fault Tolerance and Failure Handling
• Streaming systems should be designed to handle failures gracefully and recover from them without losing data or
compromising processing results.
Adaptability to Workload Variations
• Streaming systems should be able to dynamically adapt to changes in data volume, velocity, and processing
requirements to ensure optimal performance and resource utilization.
Key Concepts and Terms in Stream Processing
Data Stream
A continuous flow of data
records that are
generated and processed
in real-time.
Continuous Query
A query that is
continuously applied to a
data stream, producing
real-time results.
Out-of-Order Data
Data records in a stream
that arrive in a different
order than they were
produced.
State
The information that a
stream processing
system maintains about
the data it has seen so
far.
Fault-Tolerance
The ability of a stream
processing system to
continue operating
correctly in the presence
of failures.
Load Management
The process of efficiently
distributing the
processing workload
across multiple nodes in
a stream processing
system.
Elasticity
The ability of a stream
processing system to
dynamically scale up or
down its resources based
on the workload.
Reconfiguration
The process of modifying
the structure or behavior
of a stream processing
system while it is
running.
Key Concepts and Terms in Stream Processing
Categorization of Streaming Systems
Streaming systems canbecategorized intothreegenerations based on their data model, query language, execution
model, and system architecture. These generations represent the evolution of stream processing systems over
time.
Stream Set presentation for datapipeline.
Out-of-order Data Management
Windowing
•Use time-based or
count-based windows to
group and process
data within a specified
time or size
limit.
•Handle out-of-order
data by adjusting window
boundaries or
using watermarking
techniques.
Ordering
•Apply timestamp based
or sequence-based
ordering to ensure data is
processed in the correct
order.
•Use buffering and
buffering techniques to
handle late-arriving
events.
Revision
•Maintain a revision
history of events to
handle late-arriving
updates or corrections.
•Apply revision logic to
update previous results
based on new
information.
Progress Tracking
•Use watermarking
techniques to track the
progress of event
processing.
•Adjust processing logic
based on the current
watermark to
handle out-of-order
events.
Causes of Disorder
Stream processing refers to the real-time processing of data streams. Disorder in stream processing can have
various causes, leading to inefficiencies and inaccuracies in data analysis and decision-making. Some common
causes of disorder in stream processing include:
Causes of Disorder
Stream processing refers to the real-time processing of data streams. Disorder in stream processing can have
various causes, leading to inefficiencies and inaccuracies in data analysis and decision-making. Some common
causes of disorder in stream processing include:
Effects of Disorder
Disorderinstream processing can have significant impacts on the efficiency and accuracy of data analysis. When data
is not processed in a timely and organized manner, it can lead to various negative effects.
Effects of Disorder
Disorderinstream processing can have significant impacts on the efficiency and accuracy of data analysis. When data
is not processed in a timely and organized manner, it can lead to various negative effects.
State Management
Statemanagement is a crucial aspect of stream processing systems as it involves handling and maintaining the state
of data streams. In this section, we will explore key aspects of state management in stream processing, including
state representation, state partitioning, state persistence, and state scalability.
Key Aspects of State Management
State Management
Statemanagement is a crucial aspect of stream processing systems as it involves handling and maintaining the state
of data streams. In this section, we will explore key aspects of state management in stream processing, including
state representation, state partitioning, state persistence, and state scalability.
Key Aspects of State Management
Stream Set presentation for datapipeline.
Fault Tolerance and High Availability
Instreamprocessing systems,faulttolerance and high availability are crucial for ensuring continuous data processing
and minimizing disruptions. This section examines the key aspects of fault tolerance and high availability in stream
processing, including processing semantics, recovery strategies, and availability metrics.
Load Management, Elasticity, and Reconfiguration
Instream processingsystems,loadmanagement,elasticity,andreconfigurationplay crucial roles in ensuring efficient
and reliable data processing. This section will discuss key concepts and techniques related to load management,
elasticity, and reconfiguration.
Load Management Techniques
Elasticity and Reconfiguration Techniques
Conclusion
Comprehensive
Overview
The survey provides a
comprehensive overview
of the fundamental
aspects and
functionalities of stream
processing systems and
their evolution over time.
Comparison of Early and
Modern Systems
The paper compares
early and modern stream
processing systems with
regard to their data
models, query languages,
execution models, and
system architectures.
Highlighting Important
Works
The survey highlights
important but overlooked
works that have
influenced today’s
streaming systems
design.
Future Trends and Open Problems
The survey discusses future trends and open problems in stream processing, such as supporting multiple time
domains, handling cyclic queries, enabling transactional stream processing, and providing user-friendly ways
to specify availability.
Establishing Common
Nomenclature
The paper establishes a
common nomenclature
for streaming concepts,
often described by
inconsistent terms in
different systems and
communities.
THANK YOU

More Related Content

Similar to Stream Set presentation for datapipeline. (20)

PDF
Data Streaming and Stream management system
RizwanShaikh146
 
PDF
Uint-4 Mining Data Stream.pdf
Sitamarhi Institute of Technology
 
PDF
Spark meetup stream processing use cases
punesparkmeetup
 
PDF
Let's get to know the Data Streaming
Knoldus Inc.
 
PDF
Streaming Analytics Unit 1 notes for engineers
ManjuAppukuttan2
 
PDF
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
PDF
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
PDF
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
PDF
Introduction to Streaming Analytics
Guido Schmutz
 
PDF
Streaming systems - Part 2
Sandeep Malhotra
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Stream Processing
FogGuru MSCA Project
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PDF
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PDF
Mining Stream Data using k-Means clustering Algorithm
Manishankar Medi
 
PDF
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
PDF
[WSO2Con USA 2018] The Rise of Streaming SQL
WSO2
 
PDF
Streaming Analytics
Neera Agarwal
 
PDF
Grokking Streaming Systems: Real-time event processing 1st Edition Josh Fischer
beguetyaante
 
Data Streaming and Stream management system
RizwanShaikh146
 
Uint-4 Mining Data Stream.pdf
Sitamarhi Institute of Technology
 
Spark meetup stream processing use cases
punesparkmeetup
 
Let's get to know the Data Streaming
Knoldus Inc.
 
Streaming Analytics Unit 1 notes for engineers
ManjuAppukuttan2
 
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Sabri Skhiri
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
Introduction to Streaming Analytics
Guido Schmutz
 
Streaming systems - Part 2
Sandeep Malhotra
 
Introduction to Stream Processing
Guido Schmutz
 
Stream Processing
FogGuru MSCA Project
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Mining Stream Data using k-Means clustering Algorithm
Manishankar Medi
 
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
[WSO2Con USA 2018] The Rise of Streaming SQL
WSO2
 
Streaming Analytics
Neera Agarwal
 
Grokking Streaming Systems: Real-time event processing 1st Edition Josh Fischer
beguetyaante
 

Recently uploaded (20)

PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
July Patch Tuesday
Ivanti
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
July Patch Tuesday
Ivanti
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Ad

Stream Set presentation for datapipeline.

  • 1. Evolution of Stream Processing Systems Amit Sahu Digital Specialist Engineer
  • 2. Introduction Importance of Stream Processing • Stream processing has emerged as a critical technology for handling and analyzing high-velocity data streams. • Traditional batch processing systems are not suitable for real-time data analysis and decision-making. • Stream processing enables organizations to extract valuable insights from continuous data streams and make timely and informed decisions.
  • 3. Introduction Challenges in Stream Processing • Stream processing systems face several challenges in managing out-of-order data, handling stateful computations, ensuring fault tolerance, managing high data loads, and supporting dynamic reconfiguration. • Addressing these challenges is crucial for the successful implementation and adoption of stream processing technology.
  • 4. Characteristics of Data Streams and Continuous Queries High-Volume, Real-Time Nature • Data streams are continuous and high in volume, often arriving at a rapid rate. • Continuous queries are executed on these data streams in real-time, requiring immediate processing. On-the-Fly Processing with Limited Memory • Data streams and continuous queries need to be processed on-the-fly, without the ability to store the entire data set. • Limited memory resources pose a challenge for performing computations in real-time.
  • 5. Continuously Producing Updated Results • Continuous queries require continuously updated results as new data arrives. • The challenge lies in efficiently updating the query results without reprocessing the entire data stream. Handling High Input Rates • Data streams can have high input rates, making it challenging to process the incoming data in a timely manner. • Efficient mechanisms are required to handle the high input rates and avoid data overload. Characteristics of Data Streams and Continuous Queries
  • 6. Requirements for Streaming Systems Correctness with Out-of-Order and Delayed Data • Streaming systems need to handle data that arrives out of order or with delays, ensuring that the processing results are correct and consistent. Progress Estimation and Result Completeness • Streaming systems should provide mechanisms to estimate the progress of data processing and ensure that the results are complete and accurate. Management of Accumulated State • Streaming systems need to efficiently manage and maintain the state of ongoing computations, such as aggregations and windowing operations.
  • 7. Requirements for Streaming Systems Fault Tolerance and Failure Handling • Streaming systems should be designed to handle failures gracefully and recover from them without losing data or compromising processing results. Adaptability to Workload Variations • Streaming systems should be able to dynamically adapt to changes in data volume, velocity, and processing requirements to ensure optimal performance and resource utilization.
  • 8. Key Concepts and Terms in Stream Processing Data Stream A continuous flow of data records that are generated and processed in real-time. Continuous Query A query that is continuously applied to a data stream, producing real-time results. Out-of-Order Data Data records in a stream that arrive in a different order than they were produced. State The information that a stream processing system maintains about the data it has seen so far.
  • 9. Fault-Tolerance The ability of a stream processing system to continue operating correctly in the presence of failures. Load Management The process of efficiently distributing the processing workload across multiple nodes in a stream processing system. Elasticity The ability of a stream processing system to dynamically scale up or down its resources based on the workload. Reconfiguration The process of modifying the structure or behavior of a stream processing system while it is running. Key Concepts and Terms in Stream Processing
  • 10. Categorization of Streaming Systems Streaming systems canbecategorized intothreegenerations based on their data model, query language, execution model, and system architecture. These generations represent the evolution of stream processing systems over time.
  • 12. Out-of-order Data Management Windowing •Use time-based or count-based windows to group and process data within a specified time or size limit. •Handle out-of-order data by adjusting window boundaries or using watermarking techniques. Ordering •Apply timestamp based or sequence-based ordering to ensure data is processed in the correct order. •Use buffering and buffering techniques to handle late-arriving events. Revision •Maintain a revision history of events to handle late-arriving updates or corrections. •Apply revision logic to update previous results based on new information. Progress Tracking •Use watermarking techniques to track the progress of event processing. •Adjust processing logic based on the current watermark to handle out-of-order events.
  • 13. Causes of Disorder Stream processing refers to the real-time processing of data streams. Disorder in stream processing can have various causes, leading to inefficiencies and inaccuracies in data analysis and decision-making. Some common causes of disorder in stream processing include:
  • 14. Causes of Disorder Stream processing refers to the real-time processing of data streams. Disorder in stream processing can have various causes, leading to inefficiencies and inaccuracies in data analysis and decision-making. Some common causes of disorder in stream processing include:
  • 15. Effects of Disorder Disorderinstream processing can have significant impacts on the efficiency and accuracy of data analysis. When data is not processed in a timely and organized manner, it can lead to various negative effects.
  • 16. Effects of Disorder Disorderinstream processing can have significant impacts on the efficiency and accuracy of data analysis. When data is not processed in a timely and organized manner, it can lead to various negative effects.
  • 17. State Management Statemanagement is a crucial aspect of stream processing systems as it involves handling and maintaining the state of data streams. In this section, we will explore key aspects of state management in stream processing, including state representation, state partitioning, state persistence, and state scalability. Key Aspects of State Management
  • 18. State Management Statemanagement is a crucial aspect of stream processing systems as it involves handling and maintaining the state of data streams. In this section, we will explore key aspects of state management in stream processing, including state representation, state partitioning, state persistence, and state scalability. Key Aspects of State Management
  • 20. Fault Tolerance and High Availability Instreamprocessing systems,faulttolerance and high availability are crucial for ensuring continuous data processing and minimizing disruptions. This section examines the key aspects of fault tolerance and high availability in stream processing, including processing semantics, recovery strategies, and availability metrics.
  • 21. Load Management, Elasticity, and Reconfiguration Instream processingsystems,loadmanagement,elasticity,andreconfigurationplay crucial roles in ensuring efficient and reliable data processing. This section will discuss key concepts and techniques related to load management, elasticity, and reconfiguration. Load Management Techniques
  • 23. Conclusion Comprehensive Overview The survey provides a comprehensive overview of the fundamental aspects and functionalities of stream processing systems and their evolution over time. Comparison of Early and Modern Systems The paper compares early and modern stream processing systems with regard to their data models, query languages, execution models, and system architectures. Highlighting Important Works The survey highlights important but overlooked works that have influenced today’s streaming systems design. Future Trends and Open Problems The survey discusses future trends and open problems in stream processing, such as supporting multiple time domains, handling cyclic queries, enabling transactional stream processing, and providing user-friendly ways to specify availability. Establishing Common Nomenclature The paper establishes a common nomenclature for streaming concepts, often described by inconsistent terms in different systems and communities.