SlideShare a Scribd company logo
Hadoop vs Java Batch Processing JSR 352
AGENDA
• Introduction
• What is batch processing?
• Batch processing using Hadoop
• Batch processing using Java Batch Processing JSR 352
• When to use Hadoop or JSR 352?
• Conclusion
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 2
INTRODUCTION
Motivation for this presentation are:
• Petabytes of data available in the wild
(Internet, cars, fridge…)
• Need for competitive edge
• Processing large dataset
• Analysing large complex data (ETL)
• Generating reports
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 3
WHAT IS BATCH PROCESSING?
Batch processing is execution of a series
of programs ("jobs") on a computer without manual
intervention.
Batch processing has these benefits:
• It can shift the time of job processing to when the
computing resources are less busy.
• It avoids idling the computing resources with minute-by-
minute manual intervention and supervision.
• By keeping high overall rate of utilization, it amortizes
the computer, especially an expensive one.
• It allows the system to use different priorities for batch
and interactive work.
Source: Wikipedia
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 4
BATCH PROCESSING USING HADOOP
Hadoop is a massively scalable storage and batch data
processing system. It provides an integrated storage
and processing fabric that scales horizontally with
commodity hardware and provides fault tolerance
through software. Rather than replace existing systems,
Hadoop augments them by offloading the particularly
difficult problem of simultaneously ingesting, processing
and delivering/exporting large volumes of data so
existing systems can focus on what they were designed
to do: whether that be serve real time transactional data
or provide interactive business intelligence.
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 5
BATCH PROCESSING WITH HADOOP CONT…
• Hadoop uses the MapReduce programming model
• Parallel job processing – no need to worry about
synchronization, concurrency, hardware failure, etc…
• Databases: Using the RDBMS built-in tools to dump the
data or Hadoop native JDBC tools to extract data
• Unstructured data such as log files can be processed
using Hadoop
• Hardware and Data agnostic
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 6
BATCH PROCESSING USING JAVA BATCH
PROCESSING JSR 352
Batch processing refers to running batch jobs on a
computer system. Java EE includes a batch processing
framework that provides the batch execution
infrastructure common to all batch applications, enabling
developers to concentrate on the business logic of their
batch applications. The batch framework consists of a
job specification language based on XML, a set of batch
annotations and interfaces for application classes that
implement the business logic, a batch container that
manages the execution of batch jobs, and supporting
classes and interfaces to interact with the batch
container.
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 7
BATCH PROCESSING USING JAVA BATCH
PROCESSING JSR 352 CONT…
Java EE includes a batch processing framework that consists of the
following elements:
• A batch runtime that manages the execution of jobs.
• A job specification language based on XML.
• A Java API to interact with the batch runtime.
• A Java API to implement steps, decision elements, and other batch
artefacts.
JSR-325 is easily integrated in SOA architecture, JMX for monitoring,
Java Messaging Services and the full Java EE stack. The learning curve
for a Java EE developer is substantially reduced.
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 8
WHEN TO USE HADOOP OR JSR 352?
Java EE Batch Processing is not a competitive technology
to Apache Hadoop. They were built for different uses
cases. Here are some examples of use cases where I
believe they can be best:
Financial
Risk
Modelling
Creating
reports
from
Database
Internet
Threat
Analysis
System
housekeepin
g
Hadoop
JBatch
JSR 352
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 9
WHEN TO USE HADOOP OR JSR 352? CONT…
When deciding which technology to implement, you may
want to consider the following:
• Source of data
• Size of data
• Processing/ business logic
• Does the batch process integrates with your existing
architecture
• What do with the processed data
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 10
CONCLUSION
• JSR 352 is not a replacement for Hadoop
• You can use them both together, maybe JSR 352 as a
trigger for Hadoop jobs
• JSR 352 is better suited for small batch jobs such as
generating sales reports
• Hadoop should be used when large dataset (>1TB)
need to be analysed
• JSR352 can be easily integrated in your Enterprise
Service Bus architecture
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 11
END.
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 12
Armel Nene is software architect and developer. He is also the
founder of ETAPIX Global Limited – The Big Data Company -
www.etapix.com
Armel Nene Recruitment - www.armelnene.com is an IT
specialist recruitment based in London, UK.
@armelnene
http://uk.linkedin.com/in/armelnene/

More Related Content

What's hot (20)

PPTX
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
PDF
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks
 
PDF
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Databricks
 
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
PDF
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 
PDF
Productionizing Machine Learning with a Microservices Architecture
Databricks
 
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
PDF
Go faster with_native_compilation Part-2
Rajeev Rastogi (KRR)
 
PDF
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Databricks
 
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
PPTX
Simple Works Best
EDB
 
PDF
Koalas: How Well Does Koalas Work?
Databricks
 
PDF
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
PDF
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
Databricks
 
PDF
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Databricks
 
PDF
SparkCruise: Automatic Computation Reuse in Apache Spark
Databricks
 
PDF
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Databricks
 
PDF
Functional programming is the most extreme programming
samthemonad
 
PDF
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
PDF
Software + Babies
ArangoDB Database
 
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks
 
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Databricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Databricks
 
Faster Data Integration Pipeline Execution using Spark-Jobserver
Databricks
 
Productionizing Machine Learning with a Microservices Architecture
Databricks
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Go faster with_native_compilation Part-2
Rajeev Rastogi (KRR)
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Databricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
Simple Works Best
EDB
 
Koalas: How Well Does Koalas Work?
Databricks
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
Databricks
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Databricks
 
SparkCruise: Automatic Computation Reuse in Apache Spark
Databricks
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
Databricks
 
Functional programming is the most extreme programming
samthemonad
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Software + Babies
ArangoDB Database
 

Similar to Hadoop vs Java Batch Processing JSR 352 (20)

PPTX
Batching and Java EE (jdk.io)
Ryan Cuprak
 
PDF
Three key concepts for java batch
timfanelli
 
PDF
Batch Applications for the Java Platform
Sivakumar Thyagarajan
 
PDF
Three Key Concepts for Understanding JSR-352: Batch Programming for the Java ...
timfanelli
 
PDF
Spring batch overivew
Chanyeong Choi
 
PDF
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
PDF
Java one 2015 [con3339]
Arshal Ameen
 
PPTX
Java Batch
Software Infrastructure
 
PPTX
Spring batch for large enterprises operations
Ignasi González
 
PPT
J2EE Batch Processing
Chris Adkin
 
PDF
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
Tomáš Milata
 
PPT
Was l iberty for java batch and jsr352
sflynn073
 
PDF
Batch Applications for Java Platform 1.0: Java EE 7 and GlassFish
Arun Gupta
 
PDF
Atlanta JUG - Integrating Spring Batch and Spring Integration
Gunnar Hillert
 
PPTX
Spring batch
Yukti Kaura
 
PDF
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
PDF
Next Generation of Hadoop MapReduce
huguk
 
PPTX
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
DataWorks Summit
 
PDF
Hadoop installation by santosh nage
Santosh Nage
 
PDF
Java EE 7 Batch processing in the Real World
Roberto Cortez
 
Batching and Java EE (jdk.io)
Ryan Cuprak
 
Three key concepts for java batch
timfanelli
 
Batch Applications for the Java Platform
Sivakumar Thyagarajan
 
Three Key Concepts for Understanding JSR-352: Batch Programming for the Java ...
timfanelli
 
Spring batch overivew
Chanyeong Choi
 
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
Java one 2015 [con3339]
Arshal Ameen
 
Spring batch for large enterprises operations
Ignasi González
 
J2EE Batch Processing
Chris Adkin
 
F3-DP-2015-Milata-Tomas-java-ee-batch-editor (1)
Tomáš Milata
 
Was l iberty for java batch and jsr352
sflynn073
 
Batch Applications for Java Platform 1.0: Java EE 7 and GlassFish
Arun Gupta
 
Atlanta JUG - Integrating Spring Batch and Spring Integration
Gunnar Hillert
 
Spring batch
Yukti Kaura
 
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 
Next Generation of Hadoop MapReduce
huguk
 
Move to Hadoop, Go Faster and Save Millions - Mainframe Legacy Modernization
DataWorks Summit
 
Hadoop installation by santosh nage
Santosh Nage
 
Java EE 7 Batch processing in the Real World
Roberto Cortez
 
Ad

Recently uploaded (20)

PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Ad

Hadoop vs Java Batch Processing JSR 352

  • 2. AGENDA • Introduction • What is batch processing? • Batch processing using Hadoop • Batch processing using Java Batch Processing JSR 352 • When to use Hadoop or JSR 352? • Conclusion A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 2
  • 3. INTRODUCTION Motivation for this presentation are: • Petabytes of data available in the wild (Internet, cars, fridge…) • Need for competitive edge • Processing large dataset • Analysing large complex data (ETL) • Generating reports A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 3
  • 4. WHAT IS BATCH PROCESSING? Batch processing is execution of a series of programs ("jobs") on a computer without manual intervention. Batch processing has these benefits: • It can shift the time of job processing to when the computing resources are less busy. • It avoids idling the computing resources with minute-by- minute manual intervention and supervision. • By keeping high overall rate of utilization, it amortizes the computer, especially an expensive one. • It allows the system to use different priorities for batch and interactive work. Source: Wikipedia A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 4
  • 5. BATCH PROCESSING USING HADOOP Hadoop is a massively scalable storage and batch data processing system. It provides an integrated storage and processing fabric that scales horizontally with commodity hardware and provides fault tolerance through software. Rather than replace existing systems, Hadoop augments them by offloading the particularly difficult problem of simultaneously ingesting, processing and delivering/exporting large volumes of data so existing systems can focus on what they were designed to do: whether that be serve real time transactional data or provide interactive business intelligence. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 5
  • 6. BATCH PROCESSING WITH HADOOP CONT… • Hadoop uses the MapReduce programming model • Parallel job processing – no need to worry about synchronization, concurrency, hardware failure, etc… • Databases: Using the RDBMS built-in tools to dump the data or Hadoop native JDBC tools to extract data • Unstructured data such as log files can be processed using Hadoop • Hardware and Data agnostic A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 6
  • 7. BATCH PROCESSING USING JAVA BATCH PROCESSING JSR 352 Batch processing refers to running batch jobs on a computer system. Java EE includes a batch processing framework that provides the batch execution infrastructure common to all batch applications, enabling developers to concentrate on the business logic of their batch applications. The batch framework consists of a job specification language based on XML, a set of batch annotations and interfaces for application classes that implement the business logic, a batch container that manages the execution of batch jobs, and supporting classes and interfaces to interact with the batch container. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 7
  • 8. BATCH PROCESSING USING JAVA BATCH PROCESSING JSR 352 CONT… Java EE includes a batch processing framework that consists of the following elements: • A batch runtime that manages the execution of jobs. • A job specification language based on XML. • A Java API to interact with the batch runtime. • A Java API to implement steps, decision elements, and other batch artefacts. JSR-325 is easily integrated in SOA architecture, JMX for monitoring, Java Messaging Services and the full Java EE stack. The learning curve for a Java EE developer is substantially reduced. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 8
  • 9. WHEN TO USE HADOOP OR JSR 352? Java EE Batch Processing is not a competitive technology to Apache Hadoop. They were built for different uses cases. Here are some examples of use cases where I believe they can be best: Financial Risk Modelling Creating reports from Database Internet Threat Analysis System housekeepin g Hadoop JBatch JSR 352 A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 9
  • 10. WHEN TO USE HADOOP OR JSR 352? CONT… When deciding which technology to implement, you may want to consider the following: • Source of data • Size of data • Processing/ business logic • Does the batch process integrates with your existing architecture • What do with the processed data A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 10
  • 11. CONCLUSION • JSR 352 is not a replacement for Hadoop • You can use them both together, maybe JSR 352 as a trigger for Hadoop jobs • JSR 352 is better suited for small batch jobs such as generating sales reports • Hadoop should be used when large dataset (>1TB) need to be analysed • JSR352 can be easily integrated in your Enterprise Service Bus architecture A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 11
  • 12. END. A R M E L N E N E – E T A P I X G L O B A L L T D - W W W . E T A P I X . C O M 12 Armel Nene is software architect and developer. He is also the founder of ETAPIX Global Limited – The Big Data Company - www.etapix.com Armel Nene Recruitment - www.armelnene.com is an IT specialist recruitment based in London, UK. @armelnene http://uk.linkedin.com/in/armelnene/