SlideShare a Scribd company logo
FLUENTD
Presentation by Abhineswari M – 21MIA1025 for CSE3069
WHAT IS THIS?
 A unified log collector
 So is this free?
 It is a open source tool that collects, processes and unifies the
data (log) collection process for better use and understanding.
WHAT ARE LOGS?
Assuming that we are viewing at a service application
platform such as AWS, Microsoft, Nintendo,Toastmaster, etc
CSE3069 - FLUENTD real time analytics.pptx
CSE3069 - FLUENTD real time analytics.pptx
WHAT AND WHY DO WE NEED LOG DATA?
 Logs are automatically generated records of events that occur within a system,
application or network.
 The purpose of obtaining this might be for checking the adherence and
regulations of compliance issues
 Security at every step of the application to protect the application from potential
breaches
 Since it has details such as timestamps, user actions, system events, errors, and
performance metrics.
 Analysing errors by debugging could be efficiently carried out
 These files are then stored in structured formats such as JSON, CSV or plain text
files.
HOW DO APPLICATIONS USUALLY LOG
DATA?
Indicated the three different methods
CHALLENGES OF COLLECTING AND
CONSUMING DATA?
Data Volume & Storage
• Logs generate massive amounts of data, requiring efficient storage solutions.
• Managing log retention policies to balance historical data and storage costs.
Data Collection Complexity
• Logs come from diverse sources (applications, servers, networks, devices) in different formats.
• Ensuring consistent logging across systems can be challenging.
Real-time Processing
• Analyzing logs in real-time for security or performance monitoring requires high-speed
processing.
• Delays in log aggregation can impact response times to incidents.
CHALLENGES OF COLLECTING AND
CONSUMING DATA?
Standardization & Compatibility
• Different systems may log data in varied formats (JSON, XML, plaintext), making integration
complex.
• Standardizing log structures and using centralized logging solutions can help.
Log Noise & Redundancy
• Large logs may include excessive or redundant information, making meaningful insights
harder to extract.
• Filtering and prioritizing relevant logs is essential for efficient analysis.
CSE3069 - FLUENTD real time analytics.pptx
HOW DOES FLUENTD WORK?
 It acts as a unified logging layer
 First, it gets deployed into the
cluster and collects the log
data.
 Second, it allows for the
developers and analysts to
utilize many types of logs as
they are generated.
 It also mitigates the risk of bad
data – slowing down and
misinforming the organization
CSE3069 - FLUENTD real time analytics.pptx
 Why Most Log Formats Are Weakly Structured
1. Human-Centric Design
1. Logs were originally designed for humans, not machines, so structure wasn’t a priority.
2. Weak Standardization
1. Log producers (e.g., web servers, syslog, middleware, sensors) followed inconsistent formatting practices.
3. Parsing Challenges
1. Arbitrary text-based logs are difficult for computers to analyze.
2. Extracting meaningful data often requires complex regular expressions.
4. Inefficient Data Processing
1. Many ad-hoc scripts and one-liners are needed to parse and clean logs.
2. Lack of structured formats makes automation harder.
2025?
Logstash > Fluentd > Fluent Bit
BEFORE:
AFTER:
CSE3069 - FLUENTD real time analytics.pptx
MAIN ADVANTAGES
 Define an interface that all log producers and consumers implement against. This is the first
requirement for the Unified Logging Layer.
 Reliability and Scalability
 Buffering:
- Uses file buffer for persistent data
- Buffer chunk had ID for idempotent
 Retrying and Error handling
- When a transaction fails the buffer stores the
Data and does not need secondary backup
COMMON ARCHITECTURE
Breakdown of the Architecture
1. Forwarder Layer
1. Logs are collected from various sources, including:
1. Kubernetes clusters (On-prem or Cloud)
2. Cloud VMs (AWS, Azure, Google Cloud)
3. On-Premises Servers
2. Each source has Fluent Bit installed, which is responsible for collecting and forwarding logs.
3. Fluent Bit uses the forward protocol to send logs to an intermediate component.
2. Aggregator Layer
1. Logs from multiple forwarders are load balanced using an IP or Load Balancer.
2. Load balancing can be done using round-robin or weighted load balancing.
3. The aggregated logs are processed by Fluent Bit or Fluentd, which act as central log aggregators.
3. Destination Layer
• After processing, logs are forwarded to multiple destinations based on configurations:
• Splunk (for log analysis and monitoring)
• Kafka (for real-time streaming and event processing)
• Elasticsearch (for indexing and searching logs)
• AWS S3 (for log storage and archiving)
• MongoDB (for structured storage of log data)
PLUGINS
HOW DOES FLUENTD ROUTE?
Flexible routing through tags
to be stored in Elastic Search
Using the filter pluggin, we
can parse through the
formatting of the log
 Re-route Fluentd events in three ways:
1) by tag using the fluent-plugin-route plugin,
2) by label with the out_relabel plugin,
3) by record content with the fluent-plugin-rewrite-tag filter.
 Fluentd’s approach is more declarative whereas Logstash’s method is procedural.
 Therefore, programmers trained in procedural programming might see Logstash’s
configuration as easier for getting started.
 On the other hand, Fluentd’s tag-based routing allows complex routing to be expressed clearly.
 For example, the following configuration applies different logic to all production and
development events based on tag prefixes.
OVERALL WORKFLOW
CSE3069 - FLUENTD real time analytics.pptx
FOLLOWING THE PREVIOUS SLIDE EXAMPLE ARCHITECTURE
1. Data Sources (Left Side)
 The system collects logs from multiple sources, including:
• Manufacturing (Factories, Robotics, Assembly lines)
• Mobile & Vehicle (Drones, Smartphones, Cars, Trucks)
• Home Electronics (AC, TVs, Washing Machines, Refrigerators)
 Each of these sources generates large volumes of logs and telemetry data, which are sent to a log collector server.
2. Log Collector Server (Middle - Fluentd)
• Fluentd is used as the central log collector, aggregating logs from various sources.
• These logs are stored in a high-performance storage system.
• The logs are also structured and formatted into Apache Arrow, which is an efficient columnar in-memory format
optimized for fast processing.
3. GPU-Accelerated Processing (Middle - GPU Server)
• The logs in Apache Arrow format are sent to a DB/GPU server via GPU-Direct SQL over RDMA Network.
• RDMA (Remote Direct Memory Access) enables direct memory transfers between systems with minimal CPU
involvement, improving performance.
• GPU-Direct SQL allows SQL queries (WHERE, JOIN, GROUP BY) to be executed directly on GPUs,
significantly accelerating data processing.
4. Data Utilization (Right Side)
 Once the data is processed, it is used for multiple purposes:
• BI Tools (Visualization): Business Intelligence tools consume processed data for dashboards and reports.
• DB Admins / Users: Database administrators and analysts can query logs interactively.
• AI/ML (Anomaly Detection): Machine learning models analyze the logs for detecting anomalies, security
threats, or operational issues.
5. Elasticsearch (Bottom)
• The processed logs and analytical results can be indexed and stored in Elasticsearch, making them searchable
and accessible for advanced analytics.
CONCLUDINGLY:
 Fluentd collects logs from multiple IoT & industrial sources.
 Apache Arrow ensures efficient log processing.
 GPU-accelerated SQL speeds up query execution.
 RDMA networking enables high-speed data transfers.
 BI tools, AI/ML, and Elasticsearch utilize the processed data for visualization, anomaly detection, and search.
TAKEAWAYS
•Fluent Bit is used for lightweight log forwarding at the source level.
•Fluentd or Fluent Bit is used for aggregation at the central level.
•Load balancing ensures even distribution of logs to the aggregator.
•Multiple log destinations support different use cases like monitoring, real-time processing, and storage.
CSE3069 - FLUENTD real time analytics.pptx

More Related Content

PPTX
Centralized Logging System Using ELK Stack
Rohit Sharma
 
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
PPTX
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Piyush Kumar
 
PPTX
Setting up Sumo Logic - June 2017
Sumo Logic
 
PPTX
Setting Up Sumo Logic - Sep 2017
mariosany
 
PPTX
Setting Up Sumo Logic - Apr 2017
Sumo Logic
 
PDF
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Sumo Logic
 
PPTX
Sumo Logic Cert Jam - Administration
Sumo Logic
 
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Piyush Kumar
 
Setting up Sumo Logic - June 2017
Sumo Logic
 
Setting Up Sumo Logic - Sep 2017
mariosany
 
Setting Up Sumo Logic - Apr 2017
Sumo Logic
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Sumo Logic
 
Sumo Logic Cert Jam - Administration
Sumo Logic
 

Similar to CSE3069 - FLUENTD real time analytics.pptx (20)

PPTX
Using Sumo Logic - Apr 2018
Sumo Logic
 
PDF
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Ruby Meditation
 
PPT
60141457-Oracle-Golden-Gate-Presentation.ppt
padalamail
 
PDF
Distributed Systems in Data Engineering
Oluwasegun Matthew
 
PDF
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
Kamal Acharya
 
PDF
IRJET- ALPYNE - A Grid Computing Framework
IRJET Journal
 
PPTX
Dot Net performance monitoring
Kranthi Paidi
 
PDF
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
PPTX
Scality_Presentation.pptx
Abdou El Hajaoui
 
PPTX
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Sridhar Kumar N
 
PDF
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET Journal
 
PDF
Report: Study and Implementation of Advance Intrusion Detection and Preventio...
Deepak Mishra
 
PDF
HOST AND NETWORK SECURITY by ThesisScientist.com
Prof Ansari
 
PDF
publishable paper
chaitanya451336
 
PDF
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET Journal
 
PDF
What is Continuous Monitoring in DevOps.pdf
kalichargn70th171
 
ODP
Log aggregation and analysis
Dhaval Mehta
 
PDF
Oksana Safronova - Will you detect it or not? How to check if security team i...
NoNameCon
 
PDF
Greenplum Architecture
Alexey Grishchenko
 
PDF
What is Continuous Monitoring in DevOps.pdf
flufftailshop
 
Using Sumo Logic - Apr 2018
Sumo Logic
 
Teach your application eloquence. Logs, metrics, traces - Dmytro Shapovalov (...
Ruby Meditation
 
60141457-Oracle-Golden-Gate-Presentation.ppt
padalamail
 
Distributed Systems in Data Engineering
Oluwasegun Matthew
 
PARKING ALLOTMENT SYSTEM PROJECT REPORT REPORT.
Kamal Acharya
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET Journal
 
Dot Net performance monitoring
Kranthi Paidi
 
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Scality_Presentation.pptx
Abdou El Hajaoui
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Sridhar Kumar N
 
IRJET - The 3-Level Database Architectural Design for OLAP and OLTP Ops
IRJET Journal
 
Report: Study and Implementation of Advance Intrusion Detection and Preventio...
Deepak Mishra
 
HOST AND NETWORK SECURITY by ThesisScientist.com
Prof Ansari
 
publishable paper
chaitanya451336
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET Journal
 
What is Continuous Monitoring in DevOps.pdf
kalichargn70th171
 
Log aggregation and analysis
Dhaval Mehta
 
Oksana Safronova - Will you detect it or not? How to check if security team i...
NoNameCon
 
Greenplum Architecture
Alexey Grishchenko
 
What is Continuous Monitoring in DevOps.pdf
flufftailshop
 
Ad

Recently uploaded (20)

PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Ad

CSE3069 - FLUENTD real time analytics.pptx

  • 1. FLUENTD Presentation by Abhineswari M – 21MIA1025 for CSE3069
  • 2. WHAT IS THIS?  A unified log collector  So is this free?  It is a open source tool that collects, processes and unifies the data (log) collection process for better use and understanding.
  • 3. WHAT ARE LOGS? Assuming that we are viewing at a service application platform such as AWS, Microsoft, Nintendo,Toastmaster, etc
  • 6. WHAT AND WHY DO WE NEED LOG DATA?  Logs are automatically generated records of events that occur within a system, application or network.  The purpose of obtaining this might be for checking the adherence and regulations of compliance issues  Security at every step of the application to protect the application from potential breaches  Since it has details such as timestamps, user actions, system events, errors, and performance metrics.  Analysing errors by debugging could be efficiently carried out  These files are then stored in structured formats such as JSON, CSV or plain text files.
  • 7. HOW DO APPLICATIONS USUALLY LOG DATA? Indicated the three different methods
  • 8. CHALLENGES OF COLLECTING AND CONSUMING DATA? Data Volume & Storage • Logs generate massive amounts of data, requiring efficient storage solutions. • Managing log retention policies to balance historical data and storage costs. Data Collection Complexity • Logs come from diverse sources (applications, servers, networks, devices) in different formats. • Ensuring consistent logging across systems can be challenging. Real-time Processing • Analyzing logs in real-time for security or performance monitoring requires high-speed processing. • Delays in log aggregation can impact response times to incidents.
  • 9. CHALLENGES OF COLLECTING AND CONSUMING DATA? Standardization & Compatibility • Different systems may log data in varied formats (JSON, XML, plaintext), making integration complex. • Standardizing log structures and using centralized logging solutions can help. Log Noise & Redundancy • Large logs may include excessive or redundant information, making meaningful insights harder to extract. • Filtering and prioritizing relevant logs is essential for efficient analysis.
  • 11. HOW DOES FLUENTD WORK?  It acts as a unified logging layer  First, it gets deployed into the cluster and collects the log data.  Second, it allows for the developers and analysts to utilize many types of logs as they are generated.  It also mitigates the risk of bad data – slowing down and misinforming the organization
  • 13.  Why Most Log Formats Are Weakly Structured 1. Human-Centric Design 1. Logs were originally designed for humans, not machines, so structure wasn’t a priority. 2. Weak Standardization 1. Log producers (e.g., web servers, syslog, middleware, sensors) followed inconsistent formatting practices. 3. Parsing Challenges 1. Arbitrary text-based logs are difficult for computers to analyze. 2. Extracting meaningful data often requires complex regular expressions. 4. Inefficient Data Processing 1. Many ad-hoc scripts and one-liners are needed to parse and clean logs. 2. Lack of structured formats makes automation harder. 2025? Logstash > Fluentd > Fluent Bit
  • 17. MAIN ADVANTAGES  Define an interface that all log producers and consumers implement against. This is the first requirement for the Unified Logging Layer.  Reliability and Scalability  Buffering: - Uses file buffer for persistent data - Buffer chunk had ID for idempotent  Retrying and Error handling - When a transaction fails the buffer stores the Data and does not need secondary backup
  • 19. Breakdown of the Architecture 1. Forwarder Layer 1. Logs are collected from various sources, including: 1. Kubernetes clusters (On-prem or Cloud) 2. Cloud VMs (AWS, Azure, Google Cloud) 3. On-Premises Servers 2. Each source has Fluent Bit installed, which is responsible for collecting and forwarding logs. 3. Fluent Bit uses the forward protocol to send logs to an intermediate component. 2. Aggregator Layer 1. Logs from multiple forwarders are load balanced using an IP or Load Balancer. 2. Load balancing can be done using round-robin or weighted load balancing. 3. The aggregated logs are processed by Fluent Bit or Fluentd, which act as central log aggregators. 3. Destination Layer • After processing, logs are forwarded to multiple destinations based on configurations: • Splunk (for log analysis and monitoring) • Kafka (for real-time streaming and event processing) • Elasticsearch (for indexing and searching logs) • AWS S3 (for log storage and archiving) • MongoDB (for structured storage of log data)
  • 21. HOW DOES FLUENTD ROUTE? Flexible routing through tags to be stored in Elastic Search Using the filter pluggin, we can parse through the formatting of the log
  • 22.  Re-route Fluentd events in three ways: 1) by tag using the fluent-plugin-route plugin, 2) by label with the out_relabel plugin, 3) by record content with the fluent-plugin-rewrite-tag filter.  Fluentd’s approach is more declarative whereas Logstash’s method is procedural.  Therefore, programmers trained in procedural programming might see Logstash’s configuration as easier for getting started.  On the other hand, Fluentd’s tag-based routing allows complex routing to be expressed clearly.  For example, the following configuration applies different logic to all production and development events based on tag prefixes.
  • 25. FOLLOWING THE PREVIOUS SLIDE EXAMPLE ARCHITECTURE 1. Data Sources (Left Side)  The system collects logs from multiple sources, including: • Manufacturing (Factories, Robotics, Assembly lines) • Mobile & Vehicle (Drones, Smartphones, Cars, Trucks) • Home Electronics (AC, TVs, Washing Machines, Refrigerators)  Each of these sources generates large volumes of logs and telemetry data, which are sent to a log collector server. 2. Log Collector Server (Middle - Fluentd) • Fluentd is used as the central log collector, aggregating logs from various sources. • These logs are stored in a high-performance storage system. • The logs are also structured and formatted into Apache Arrow, which is an efficient columnar in-memory format optimized for fast processing. 3. GPU-Accelerated Processing (Middle - GPU Server) • The logs in Apache Arrow format are sent to a DB/GPU server via GPU-Direct SQL over RDMA Network. • RDMA (Remote Direct Memory Access) enables direct memory transfers between systems with minimal CPU involvement, improving performance. • GPU-Direct SQL allows SQL queries (WHERE, JOIN, GROUP BY) to be executed directly on GPUs, significantly accelerating data processing.
  • 26. 4. Data Utilization (Right Side)  Once the data is processed, it is used for multiple purposes: • BI Tools (Visualization): Business Intelligence tools consume processed data for dashboards and reports. • DB Admins / Users: Database administrators and analysts can query logs interactively. • AI/ML (Anomaly Detection): Machine learning models analyze the logs for detecting anomalies, security threats, or operational issues. 5. Elasticsearch (Bottom) • The processed logs and analytical results can be indexed and stored in Elasticsearch, making them searchable and accessible for advanced analytics. CONCLUDINGLY:  Fluentd collects logs from multiple IoT & industrial sources.  Apache Arrow ensures efficient log processing.  GPU-accelerated SQL speeds up query execution.  RDMA networking enables high-speed data transfers.  BI tools, AI/ML, and Elasticsearch utilize the processed data for visualization, anomaly detection, and search.
  • 27. TAKEAWAYS •Fluent Bit is used for lightweight log forwarding at the source level. •Fluentd or Fluent Bit is used for aggregation at the central level. •Load balancing ensures even distribution of logs to the aggregator. •Multiple log destinations support different use cases like monitoring, real-time processing, and storage.