SlideShare a Scribd company logo
Click to edit Master title style
1
BIG DATA
ANALYSIS
REAL TIME BIG DATA
PROCESSING
BY:
PONNARASU A
112225
CSE B
Click to edit Master title style
2 2
INTRODUCTION
• Real-time big data processing involves analyzing and acting upon data as it is
generated or received.
• This approach allows for immediate insights and responses, which is crucial in
various applications such as fraud detection, personalized recommendations, and
monitoring systems.
• By implementing real-time data processing systems, businesses can achieve
higher efficiency, better customer experiences, and a stronger competitive edge.
Click to edit Master title style
3
TERMINOLOGIES USED
3
Latency: The delay between the generation of data and the processing or action on that data.
.
Throughput: The amount of data processed in a given period of time
Event Stream : A continuous flow of data events generated by various sources, such as sensors,
user interactions, or transactions.
Stream Processing : The real-time processing of data streams to extract insights and trigger
actions.
Data Ingestion : The process of collecting and importing data for immediate use or storage.
Scalability : The capability of a system to handle growing amounts of work or data by adding
resources.
Fault Tolerance : The ability of a system to continue operating without interruption when one or
more of its components fail.
Click to edit Master title style
4
REAL TIME VS BATCH PROCESSING
4
Real-Time Processing
Real-time processing involves the continuous input, processing, and output
of data. Data is processed as soon as it is generated or received, enabling
immediate insights and actions.
Characteristics:
• Latency: Very low, often in milliseconds or seconds.
• Data Handling: Continuous flow of data, processed in real-time.
• Use Cases: Fraud detection, real-time recommendations, live monitoring,
financial trading.
• Technologies: Apache Kafka, Apache Storm, Apache Flink, Spark
Streaming.
Click to edit Master title style
5 5
Batch Processing
Batch processing involves collecting data over a period and processing
it in bulk. Data is accumulated, then processed at scheduled intervals,
allowing for comprehensive analysis of large data sets.
Characteristics:
• Latency: Higher, ranging from minutes to hours or even days.
• Data Handling: Processes data in large volumes at specific intervals.
• Use Cases: End-of-day reporting, data warehousing, historical data
analysis.
• Technologies: Hadoop MapReduce, Apache Spark, Apache Hive,
Apache Pig.
Click to edit Master title style
6 6
COMPARISON
Aspect Real-Time Processing Batch Processing
Latency Milliseconds to seconds Minutes to hours or days
Data Handling Continuous, as data arrives Bulk, at scheduled intervals
Use Cases Immediate insights, live monitoring Comprehensive analysis, historical data
Advantages
Immediate actions, up-to-date
information
Efficient for large volumes, cost-
effective
Challenges
Complexity, scalability, fault
tolerance
Complexity, scalability, fault tolerance
Technologies Kafka, Storm, Flink, Spark Streaming Hadoop MapReduce, Spark, Hive, Pig
Click to edit Master title style
7
BASIC TECHNOLOGIES
7
Click to edit Master title style
8
DATA SOURCES
8
• Sensors and IoT Devices: Devices that collect and transmit data about their
environment.EX:IoT devices, environmental sensors
• Social Media: Platforms where users generate a continuous stream of data
through posts, comments, likes, and shares.EX:Twitter, Facebook feeds
• Financial Transactions: Data from payment systems, stock exchanges, and
financial institutions.EX:Twitter, Facebook feeds
• Log Files: Continuous records of events or activities in software applications and
systems.EX:Server logs, application logs
Click to edit Master title style
9
KEY TECHNOLOGIES
9
• Apache kafka
A distributed streaming platform that handles real-time data feeds.
Features:
•High throughput for publishing and subscribing to data streams.
•Durable storage of streams.
• Apache Storm
A distributed real-time computation system for processing data streams.
• Features:
• Fast and reliable processing.
• Supports various programming languages.
Click to edit Master title style
10
10
• Apache Flink
A stream processing framework with powerful event-time processing
capabilities.
Features:
• Stateful computations over data streams.
• Exactly-once processing guarantees.
• Spark Streaming (Apache Spark)
A scalable and fault-tolerant stream processing system built on
Apache Spark.
Features:
• Micro-batch processing model.
• Integration with Spark's batch and machine learning libraries.
Click to edit Master title style
11
11
• Spark Streaming (Apache Spark)
A scalable and fault-tolerant stream processing system built on
Apache Spark.
Features:
• Micro-batch processing model.
• Integration with Spark's batch and machine learning libraries.
• Amazon Kinesis
A platform for real-time data streaming and analytics by AWS.
Features:
• Easily collect, process, and analyze real-time data.
• Scalable and fully managed.
Click to edit Master title style
12
NoSQL Storage Technologies in Real-Time Data
Processing
12
• Apache Cassandra
A highly scalable, distributed NoSQL database designed to handle large
amounts of data across many commodity servers.
Features:
• High availability with no single point of failure.
• Linear scalability.
• MongoDB
A document-oriented NoSQL database that stores data in JSON-like
format.
Features:
• Flexible schema design.
• Powerful querying and indexing.
Click to edit Master title style
13
13
• Redis
An in-memory key-value store known for its high performance and
support for various data structures.
Features:
• Extremely low latency.
• Supports complex data structures (lists, sets, sorted sets).
• Amazon DynamoDB
A fully managed NoSQL database service by AWS that provides fast
and predictable performance with seamless scalability.
Features:
• Single-digit millisecond response times.
• Fully managed and serverless.
Click to edit Master title style
14
Data Processing in Real-Time Big Data Systems
14
Steps in Real-Time Data Processing:
• Data Ingestion
The process of collecting and importing data in real-time from various sources.
• Data Stream Processing
Continuous processing of data streams to derive insights and trigger actions.
• Data Transformation
Converting raw data into a structured format or enriching it with additional
information.
Click to edit Master title style
15
15
• Data Storage
Storing processed data in databases or data lakes for further analysis and
querying.
• Data Analysis and Querying
Analyzing processed data to extract insights and generate reports or dashboards.
• Data Visualization
Presenting data insights through interactive dashboards and visualizations.
• Event Handling and Alerting
Responding to specific events or conditions detected in the data.
Click to edit Master title style
16
FURURE TRENDS
16
• Edge Computing: Performing data processing tasks closer to the data
source to minimize latency and reduce data transmission costs.
• Enhanced Real-Time Analytics with AI and Machine Learning:Integrating
artificial intelligence (AI) and machine learning (ML) with real-time data
processing to enhance predictive analytics and decision-making.
• Quantum Computing: Exploring quantum computing for solving complex
problems in real-time data processing and analytics.
Click to edit Master title style
17
17
• Privacy-Preserving Data Processing:Ensuring data privacy with
federated learning and advanced encryption.
• Serverless Architectures: Implementing serverless computing to manage real-
time data processing tasks without managing infrastructure.
• Quantum Computing: Exploring quantum computing for solving complex
problems in real-time data processing and analytics.
Click to edit Master title style
18
18
CONCLUSION
Real-time big data processing is essential for
deriving immediate insights and actions in various
industries. Understanding the key concepts,
technologies, and best practices helps in designing
efficient and effective real-time data systems.
Click to edit Master title style
19
Thank You

More Related Content

Similar to real time data processing is a tsubtopic in the topic in the domain bigdata (20)

PPTX
Data lake-itweekend-sharif university-vahid amiry
datastack
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PPTX
ETL Pipeline for the snowflake problem statement
JayantAsudhani1
 
PPTX
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
PPTX
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
PPTX
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
PDF
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PPTX
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
PDF
Which Change Data Capture Strategy is Right for You?
Precisely
 
PDF
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
PDF
2022 Trends in Enterprise Analytics
DATAVERSITY
 
PDF
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
PPTX
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
PDF
Building real time data-driven products
Lars Albertsson
 
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
ETL Pipeline for the snowflake problem statement
JayantAsudhani1
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
Which Change Data Capture Strategy is Right for You?
Precisely
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
HostedbyConfluent
 
2022 Trends in Enterprise Analytics
DATAVERSITY
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
Automated Analytics at Scale
DataWorks Summit/Hadoop Summit
 
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Data Con LA
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
Building real time data-driven products
Lars Albertsson
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
Adaryl "Bob" Wakefield, MBA
 

Recently uploaded (20)

DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Design Thinking basics for Engineers.pdf
CMR University
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Ad

real time data processing is a tsubtopic in the topic in the domain bigdata

  • 1. Click to edit Master title style 1 BIG DATA ANALYSIS REAL TIME BIG DATA PROCESSING BY: PONNARASU A 112225 CSE B
  • 2. Click to edit Master title style 2 2 INTRODUCTION • Real-time big data processing involves analyzing and acting upon data as it is generated or received. • This approach allows for immediate insights and responses, which is crucial in various applications such as fraud detection, personalized recommendations, and monitoring systems. • By implementing real-time data processing systems, businesses can achieve higher efficiency, better customer experiences, and a stronger competitive edge.
  • 3. Click to edit Master title style 3 TERMINOLOGIES USED 3 Latency: The delay between the generation of data and the processing or action on that data. . Throughput: The amount of data processed in a given period of time Event Stream : A continuous flow of data events generated by various sources, such as sensors, user interactions, or transactions. Stream Processing : The real-time processing of data streams to extract insights and trigger actions. Data Ingestion : The process of collecting and importing data for immediate use or storage. Scalability : The capability of a system to handle growing amounts of work or data by adding resources. Fault Tolerance : The ability of a system to continue operating without interruption when one or more of its components fail.
  • 4. Click to edit Master title style 4 REAL TIME VS BATCH PROCESSING 4 Real-Time Processing Real-time processing involves the continuous input, processing, and output of data. Data is processed as soon as it is generated or received, enabling immediate insights and actions. Characteristics: • Latency: Very low, often in milliseconds or seconds. • Data Handling: Continuous flow of data, processed in real-time. • Use Cases: Fraud detection, real-time recommendations, live monitoring, financial trading. • Technologies: Apache Kafka, Apache Storm, Apache Flink, Spark Streaming.
  • 5. Click to edit Master title style 5 5 Batch Processing Batch processing involves collecting data over a period and processing it in bulk. Data is accumulated, then processed at scheduled intervals, allowing for comprehensive analysis of large data sets. Characteristics: • Latency: Higher, ranging from minutes to hours or even days. • Data Handling: Processes data in large volumes at specific intervals. • Use Cases: End-of-day reporting, data warehousing, historical data analysis. • Technologies: Hadoop MapReduce, Apache Spark, Apache Hive, Apache Pig.
  • 6. Click to edit Master title style 6 6 COMPARISON Aspect Real-Time Processing Batch Processing Latency Milliseconds to seconds Minutes to hours or days Data Handling Continuous, as data arrives Bulk, at scheduled intervals Use Cases Immediate insights, live monitoring Comprehensive analysis, historical data Advantages Immediate actions, up-to-date information Efficient for large volumes, cost- effective Challenges Complexity, scalability, fault tolerance Complexity, scalability, fault tolerance Technologies Kafka, Storm, Flink, Spark Streaming Hadoop MapReduce, Spark, Hive, Pig
  • 7. Click to edit Master title style 7 BASIC TECHNOLOGIES 7
  • 8. Click to edit Master title style 8 DATA SOURCES 8 • Sensors and IoT Devices: Devices that collect and transmit data about their environment.EX:IoT devices, environmental sensors • Social Media: Platforms where users generate a continuous stream of data through posts, comments, likes, and shares.EX:Twitter, Facebook feeds • Financial Transactions: Data from payment systems, stock exchanges, and financial institutions.EX:Twitter, Facebook feeds • Log Files: Continuous records of events or activities in software applications and systems.EX:Server logs, application logs
  • 9. Click to edit Master title style 9 KEY TECHNOLOGIES 9 • Apache kafka A distributed streaming platform that handles real-time data feeds. Features: •High throughput for publishing and subscribing to data streams. •Durable storage of streams. • Apache Storm A distributed real-time computation system for processing data streams. • Features: • Fast and reliable processing. • Supports various programming languages.
  • 10. Click to edit Master title style 10 10 • Apache Flink A stream processing framework with powerful event-time processing capabilities. Features: • Stateful computations over data streams. • Exactly-once processing guarantees. • Spark Streaming (Apache Spark) A scalable and fault-tolerant stream processing system built on Apache Spark. Features: • Micro-batch processing model. • Integration with Spark's batch and machine learning libraries.
  • 11. Click to edit Master title style 11 11 • Spark Streaming (Apache Spark) A scalable and fault-tolerant stream processing system built on Apache Spark. Features: • Micro-batch processing model. • Integration with Spark's batch and machine learning libraries. • Amazon Kinesis A platform for real-time data streaming and analytics by AWS. Features: • Easily collect, process, and analyze real-time data. • Scalable and fully managed.
  • 12. Click to edit Master title style 12 NoSQL Storage Technologies in Real-Time Data Processing 12 • Apache Cassandra A highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers. Features: • High availability with no single point of failure. • Linear scalability. • MongoDB A document-oriented NoSQL database that stores data in JSON-like format. Features: • Flexible schema design. • Powerful querying and indexing.
  • 13. Click to edit Master title style 13 13 • Redis An in-memory key-value store known for its high performance and support for various data structures. Features: • Extremely low latency. • Supports complex data structures (lists, sets, sorted sets). • Amazon DynamoDB A fully managed NoSQL database service by AWS that provides fast and predictable performance with seamless scalability. Features: • Single-digit millisecond response times. • Fully managed and serverless.
  • 14. Click to edit Master title style 14 Data Processing in Real-Time Big Data Systems 14 Steps in Real-Time Data Processing: • Data Ingestion The process of collecting and importing data in real-time from various sources. • Data Stream Processing Continuous processing of data streams to derive insights and trigger actions. • Data Transformation Converting raw data into a structured format or enriching it with additional information.
  • 15. Click to edit Master title style 15 15 • Data Storage Storing processed data in databases or data lakes for further analysis and querying. • Data Analysis and Querying Analyzing processed data to extract insights and generate reports or dashboards. • Data Visualization Presenting data insights through interactive dashboards and visualizations. • Event Handling and Alerting Responding to specific events or conditions detected in the data.
  • 16. Click to edit Master title style 16 FURURE TRENDS 16 • Edge Computing: Performing data processing tasks closer to the data source to minimize latency and reduce data transmission costs. • Enhanced Real-Time Analytics with AI and Machine Learning:Integrating artificial intelligence (AI) and machine learning (ML) with real-time data processing to enhance predictive analytics and decision-making. • Quantum Computing: Exploring quantum computing for solving complex problems in real-time data processing and analytics.
  • 17. Click to edit Master title style 17 17 • Privacy-Preserving Data Processing:Ensuring data privacy with federated learning and advanced encryption. • Serverless Architectures: Implementing serverless computing to manage real- time data processing tasks without managing infrastructure. • Quantum Computing: Exploring quantum computing for solving complex problems in real-time data processing and analytics.
  • 18. Click to edit Master title style 18 18 CONCLUSION Real-time big data processing is essential for deriving immediate insights and actions in various industries. Understanding the key concepts, technologies, and best practices helps in designing efficient and effective real-time data systems.
  • 19. Click to edit Master title style 19 Thank You