Hadoop vs Java Batch Processing JSR 352

AGENDA
• Introduction
• What is batch processing?
• Batch processing using Hadoop
• Batch processing using Java Batch Processing JSR 352
• When to use Hadoop or JSR 352?
• Conclusion
A R M E L N E N E – E T A P I X G L O B A L L T D -
W W W . E T A P I X . C O M 2

INTRODUCTION
Motivation for this presentation are:
• Petabytes of data available in the wild
(Internet, cars, fridge…)
• Need for competitive edge
• Processing large dataset
• Analysing large complex data (ETL)
• Generating reports

WHAT IS BATCH PROCESSING?
Batch processing is execution of a series
of programs ("jobs") on a computer without manual
intervention.
Batch processing has these benefits:
• It can shift the time of job processing to when the
computing resources are less busy.
• It avoids idling the computing resources with minute-by-
minute manual intervention and supervision.
• By keeping high overall rate of utilization, it amortizes
the computer, especially an expensive one.
• It allows the system to use different priorities for batch
and interactive work.
Source: Wikipedia

BATCH PROCESSING USING HADOOP
Hadoop is a massively scalable storage and batch data
processing system. It provides an integrated storage
and processing fabric that scales horizontally with
commodity hardware and provides fault tolerance
through software. Rather than replace existing systems,
Hadoop augments them by offloading the particularly
difficult problem of simultaneously ingesting, processing
and delivering/exporting large volumes of data so
existing systems can focus on what they were designed
to do: whether that be serve real time transactional data
or provide interactive business intelligence.

BATCH PROCESSING WITH HADOOP CONT…
• Hadoop uses the MapReduce programming model
• Parallel job processing – no need to worry about
synchronization, concurrency, hardware failure, etc…
• Databases: Using the RDBMS built-in tools to dump the
data or Hadoop native JDBC tools to extract data
• Unstructured data such as log files can be processed
using Hadoop
• Hardware and Data agnostic

BATCH PROCESSING USING JAVA BATCH
PROCESSING JSR 352
Batch processing refers to running batch jobs on a
computer system. Java EE includes a batch processing
framework that provides the batch execution
infrastructure common to all batch applications, enabling
developers to concentrate on the business logic of their
batch applications. The batch framework consists of a
job specification language based on XML, a set of batch
annotations and interfaces for application classes that
implement the business logic, a batch container that
manages the execution of batch jobs, and supporting
classes and interfaces to interact with the batch
container.

BATCH PROCESSING USING JAVA BATCH
PROCESSING JSR 352 CONT…
Java EE includes a batch processing framework that consists of the
following elements:
• A batch runtime that manages the execution of jobs.
• A job specification language based on XML.
• A Java API to interact with the batch runtime.
• A Java API to implement steps, decision elements, and other batch
artefacts.
JSR-325 is easily integrated in SOA architecture, JMX for monitoring,
Java Messaging Services and the full Java EE stack. The learning curve
for a Java EE developer is substantially reduced.

WHEN TO USE HADOOP OR JSR 352?
Java EE Batch Processing is not a competitive technology
to Apache Hadoop. They were built for different uses
cases. Here are some examples of use cases where I
believe they can be best:
Financial
Risk
Modelling
Creating
reports
from
Database
Internet
Threat
Analysis
System
housekeepin
g
Hadoop
JBatch
JSR 352

WHEN TO USE HADOOP OR JSR 352? CONT…
When deciding which technology to implement, you may
want to consider the following:
• Source of data
• Size of data
• Processing/ business logic
• Does the batch process integrates with your existing
architecture
• What do with the processed data

CONCLUSION
• JSR 352 is not a replacement for Hadoop
• You can use them both together, maybe JSR 352 as a
trigger for Hadoop jobs
• JSR 352 is better suited for small batch jobs such as
generating sales reports
• Hadoop should be used when large dataset (>1TB)
need to be analysed
• JSR352 can be easily integrated in your Enterprise
Service Bus architecture

END.
Armel Nene is software architect and developer. He is also the
founder of ETAPIX Global Limited – The Big Data Company -
www.etapix.com
Armel Nene Recruitment - www.armelnene.com is an IT
specialist recruitment based in London, UK.
@armelnene
http://uk.linkedin.com/in/armelnene/

Hadoop vs Java Batch Processing JSR 352

More Related Content

What's hot (20)

Similar to Hadoop vs Java Batch Processing JSR 352 (20)

Recently uploaded (20)

Hadoop vs Java Batch Processing JSR 352