SlideShare a Scribd company logo
Building Beautiful Batch Jobs !
Who says batch jobs can’t be beautiful code?

SouthBay JVM User Group (SBJUG)
Meetup - November 2013
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
About me
• Software Engineer
• Worked on complex integration projects
– CSIS, LAPD, UCLA

• Worked on one high traffic system
– Napster

• Currently at Dealer.com
• Fascinated by all things Engineering
Dealer.com
• Leader in Automotive Marketing
• 10K+ clients, 12K+ Websites
• CRM is our new product offering

• It’s definitely a great place to work. I’d
recommend it to a friend.
Believe it or not – these are actually Dealer.com’s Core Values
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Background
• Lack of frameworks for Java-based batch
processing
• Proliferation of many one-off, in-house solutions
• SpringSource and Accenture changed this
• June 2008 – production version of Spring Batch
• Spring Batch is the only open source framework
that provides a robust, enterprise-scale solution
• Batch Application for Java Platform is coming
soon (JSR 352)
Usage Scenario
A typical batch program reads a large number of
records from a database, file, or queue, processes
the data in some fashion, and then writes back data
in a modified form
•
•
•
•
•
•

Commit batch process periodically
Sequential processing of dependent steps
Partial processing: skip records
Concurrent batch processing
Massively parallel batch processing
Manual or scheduled restart after failure
Domain Language of a Batch
• Job
• Step
• Item Reader

-

• Item Processor • Item Writer

-

•
•
•
•

-

Job Launcher
Job Repository
Job Instance
Job Execution

has one to many steps
has item reader, processor or writer
an abstraction that represents the retrieval of
input for a Step, one item at a time
an abstraction that represents the business
processing of an item
an abstraction that represents the output of a
Step, chunk of items at a time
launches jobs
store metadata about currently running jobs
an instance of a job with its unique parameters
an execution attempt of a job instance
Batch Components
Job, Job Instance, Job Execution
Job Parameters
Job – Tasklet
Job – Sequential Flow
Job – Conditional Flow
Job – Chunk Oriented Processing
Item Readers and Writers - Out of the box
Item Readers

Item Writers

AmqpItemReader

AmqpItemWriter

FlatFileItemReader

CompositeItemWriter

HibernateCursorItemReader

FlatFileItemWriter

HibernatePagingItemReader

GemfireItemWriter

IbatisPagingItemReader

HibernateItemWriter

ItemReaderAdapter

IbatisBatchItemWriter

JdbcCursorItemReader

ItemWriterAdapter

JdbcPagingItemReader

JdbcBatchItemWriter

JmsItemReader

JmsItemWriter

JpaPagingItemReader

JpaItemWriter

ListItemReader

MimeMessageItemWriter

MongoItemReader

MongoItemWriter

Neo4jItemReader

Neo4jItemWriter

RepositoryItemReader

RepositoryItemWriter

StoredProcedureItemReader

PropertyExtractingDelegatingItemWriter

StaxEventItemReader

StaxEventItemWriter
Job Repository Data Model
Let’s look at a couple of examples of building simple
Spring Batch Jobs

Example 1 – Load Flat file contents into database
Example 2 – Load XML file contents into database
Configure DataSource and Spring Batch Core Beans
spring-batch-context.xml :
Example1: Load Flat file contents into database
PERSON Table

person-data.csv

Jill,Doe
Joe,Doe
Justin,Doe
Jane,Doe
John,Doe

PERSON_ID
1

JILL

DOE

2

JOE

DOE

3
Transform Data to
Upper Case

FIRST_NAME LAST_NAME

JUSTIN

DOE

4

JANE

DOE

5

JOHN

DOE
Example1: Job Config
flat-file-reader-job.xml

Chunk Processing:
• Reader – retrieves input for a Step one item at a time
• Processor – processes an item
• Writer – writes the output, one item or chunk of items at a time
Example1: Reader, Processor and Writer
flat-file-reader-job.xml (cont..d)
Example1: Person Item Processor
Example1: Test Case to Execute Flat File Reader Job
Example2: Load XML file contents into database
record-data.xml

AD_PERFORMANCE Table
ID

DATE

IMPRESSION CLICKS EARNING

1

06/01/2013

139237

57

220.90

2

06/02/2013

339100

57

320.88

3

06/03/2013

431436

57

27.80
Example2: Job Config
xml-file-reader-job.xml
Example2: Reader, JAXB Unmarshaller, Processor and Writer
xml-file-reader-job.xml (cont..d)
Example2: Record Item Processor
Example2: Ad Performance Writer
Example2: Test Case to Execute XML File Reader Job
Spring Batch Admin Webapp
SBJUG - Building Beautiful Batch Jobs
Jobs
Job Executions
Job Execution Details
SBJUG - Building Beautiful Batch Jobs
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
Business Problem
• CRM entering Dealer's day-to-day Operations

• We need to Pull data from Dealer’s DMS systems into CRM
• DMS Systems can be ADP or Reynolds or DealerTrack etc
Here’s a Small Big Picture
Dealer’s DMS
Systems

Dealer.com’s
DMS & CRM Systems

ADP
Extract

Reynolds

DealerTrack

DMS

Load

CRM
Typical Batch Job
• Download data from DMS Provider for a dealership
• Load the data in CRM
• Generate report on how the data was processed
ADP Vehicle Sales ETL Job Configuration
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Pull Frequently
• We have 100s of Dealerships, so each batch Job has to be run
for a Dealer’s ADP Account
• We schedule Jobs for each dealership to pull every 4 hours
• The Job Scheduling is managed via a centralized DDC
Scheduling Server
– Clients issue scheduling requests via a command queue to the server
– The server will then fire scheduled events back onto a queue for
clients to consume
– Clients and DDC Scheduling Server communicate through a single
rabbit exchange. Each client is chooses an unique application key and
binds to this exchange to receive messages about its scheduled events
– Named ClockTower: it’s worth a separate talk in itself
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Job Concurrency
• 100s of scheduled or manually initiated jobs can all go off at
the same time
• We want to control how many jobs should run in our Cluster
concurrently
• We used basic queuing to solve this
– all job commands go into a queue
– they get processed one at a time
– we can control how many consumers we want to allow
across the cluster
• We use Spring Integration AMQP OutBound & InBound
Adapters
Running Jobs Concurrently – Competing Consumer Pattern
DMS Service 01

Job1

Scheduled and Manually Initiated Job
Commands come through the same Queue

DMS Pull Job Queue

Job5

Job4

Job3

DMS Service 02

Job2

• Each Node is configured with multiple concurrent Consumers (3 as of now)
• As we take more Tenants we could scale horizontally by adding more Nodes
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Data Flow Control
ADP

Extract

DMS

Load

CRM

• We need to control the load we put on the CRM system
• We don't want to EVER load too much data at the same time
• We debated two ways to solve this
– Synchronous
– Asynchronous (via Queues)
Sync vs Async Loading Data into CRM
CRM
Batch 01

DMS Service 01

Job1

CRM
Batch
Service
Load
Balancer

(SYNC)
CRM
Batch 02

CRM
Batch 01

DMS Service 01

(ASYNC)

DMS Data Load Queue
Job1

CRM
Batch 02
Synchronous
• Haproxy load balancer - cannot be scaled dynamically
• Remote call needs to be made via REST or Spring Remoting API - tightly coupled
• Client has to fail the batch job or retry the request on failure - not fault tolerant
• Nodes need to throttle the number of incoming requests (via tomcat threads) –
have to administer tomcat threads, nodes cannot be repurposed
Asynchronous
• AMQP Rabbit Queue - can be scaled dynamically
• Only contract is the 'message' being passed – some what loosely coupled
• If a node fails, message will be unacknowledged and another node will execute the
same request - fault tolerant
• Each node can control the number of concurrent queue consumers – application
configuration, nodes can be purposed
• It does incur some extra cost, message persistence & dynamic reply queues - extra
cost
We settled on loading via Queue using Spring Integration AMQP Gateways (which
are Bi-Directional), the call waits for response to come back via reply queue
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
We send out an awesome looking email notification to an internal mailing list
The CSV Report has Detailed information how each row was processed
We are working towards a UI that’ll look like this
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
Job Resiliency
ADP

Extract

DMS

Load

CRM

100s of Jobs could go off at the same time and jobs need to be resilient to
unexpected failures
• While a big job is running, CRM could crash or get restarted for deployment
• While a big job is running, DMS could crash or get restarted for deployment
In such cases, we want to rerun the job after a short while from where it left off.
• We use Spring Batch’s Job Restart-ability feature to achieve this
What could go wrong?
DMS Service 01

Job1

CRM
Batch 01

X

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

X

X  Nodes that could just crash or could be restarted due to a deployment –
when a big job is running.

Our goal is to be able to rerun the job, and resume from where left things left off.

X
Spring Batch – Restartability
• Spring Batch maintains Job State in the database
– which Step is completed, being processed or failed
– Which item is being processed when Chunk processing
• Jobs can be restarted using the Job ExecutionId
• Spring Batch will skip over the steps and run the job from
where it left off before
• If the job had failed during Chunk processing it’ll skip
processing the items that were already processed and start
from where it left off before
When CRM goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

X

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

• We have a timeout of 5 minutes for the reply from CRM
• When CRM Batch Nodes are down, we’ll get a timeout Exception, which results
in a new Job Command Message to the DMS Pull Job Queue
• The message includes the JobExecutionId
• Which ever node picks up the message will resume the job from where it left
off

X
When a DMS Service Node goes down
DMS Service 01

CRM
Batch 01

Job1

DMS Pull Job Queue

DMS Data Load Queue
DMS Service 02

Job2

CRM
Batch 02

X

• When a DMS Node executing the Job goes down, the message will be
unacknowledged, and will be picked up by any other node connected to the
DMS Pull Job Queue
• The node that picks up the message will inspect if this job was already running
and stopped abruptly, and if so it’ll try to resume it from where it left off
• (This is not in production yet, its under development)
Some requirements
•
•
•
•
•
•

We need to pull frequently for a lot of dealers
We need job concurrency control
We need data flow control for loading into CRM
We need to know how the data was processed
We need job resiliency
We need beautiful batch jobs 
So, what makes it beautiful?
• Simple
– We just used the basic features of Spring Batch

• Easy to understand
– Quick look at spring configurations is all you need

• Less code
– We focused on the business logic

• Low maintenance
– Anybody can maintain it
In this tech talk we’ll cover
• Spring Batch Introduction
• Dealer.com Real World Usage
• Some Lessons Learned
On Spring Batch
•
•
•
•
•
•

Really easy to setup and user
Highly configurable
Chunk Processing is the bomb!
Beware of the commit count
The bean ‘step’ scope comes in handy
ExecutionContext is limited to 4 data types
On 3rd Party Integration
•
•
•
•
•
•

Plan for Dev & Live accounts and environments
Configure anything and everything possible
Download large files via streaming
Handle exceptions properly
Embrace data translation errors
Build jobs that are repeat runnable
Sources
• Spring Batch Reference Documentation
– http://docs.spring.io/spring-batch/reference/html-single/index.html
• Ad Performance Sample XML taken from
– http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
Questions?
Shameless Plug
Currently we have a few openings in the Manhattan Beach
office
• Java Developers
• UI Developers
• Web Developers
If interested please apply at http://careers.dealer.com/

More Related Content

PPTX
Parallel batch processing with spring batch slideshare
Morten Andersen-Gott
 
KEY
Spring Batch Behind the Scenes
Joshua Long
 
PDF
Spring Batch Workshop (advanced)
lyonjug
 
PDF
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
PDF
Spring Batch Workshop
lyonjug
 
PPTX
Spring batch
nishasowdri
 
PPT
Spring Batch 2.0
Guido Schmutz
 
PPTX
Batching and Java EE (jdk.io)
Ryan Cuprak
 
Parallel batch processing with spring batch slideshare
Morten Andersen-Gott
 
Spring Batch Behind the Scenes
Joshua Long
 
Spring Batch Workshop (advanced)
lyonjug
 
Design & Develop Batch Applications in Java/JEE
Naresh Chintalcheru
 
Spring Batch Workshop
lyonjug
 
Spring batch
nishasowdri
 
Spring Batch 2.0
Guido Schmutz
 
Batching and Java EE (jdk.io)
Ryan Cuprak
 

What's hot (20)

PDF
Groovy concurrency
Alex Miller
 
PDF
Scaling Your Cache
Alex Miller
 
PDF
Scaling Hibernate with Terracotta
Alex Miller
 
PDF
System Integration with Akka and Apache Camel
krasserm
 
PDF
Caching In The Cloud
Alex Miller
 
PPTX
Concurrency in Scala - the Akka way
Yardena Meymann
 
PPTX
Java Enterprise Performance - Unburdended Applications
Lucas Jellema
 
PDF
Cold Hard Cache
Alex Miller
 
PPTX
Distributed Model Validation with Epsilon
Sina Madani
 
PPTX
Akka Actor presentation
Gene Chang
 
ZIP
Above the clouds: introducing Akka
nartamonov
 
PPTX
What’s expected in Spring 5
Gal Marder
 
PDF
Java Enterprise Edition Concurrency Misconceptions
Haim Yadid
 
PDF
Akka lsug skills matter
Skills Matter
 
PDF
Play Framework: async I/O with Java and Scala
Yevgeniy Brikman
 
PPTX
Introduction to Akka - Atlanta Java Users Group
Roy Russo
 
PDF
Lazy vs. Eager Loading Strategies in JPA 2.1
Patrycja Wegrzynowicz
 
PDF
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
European SharePoint Conference
 
PDF
Practicing Continuous Deployment
zeeg
 
PPTX
Utilizing the OpenNTF Domino API
Oliver Busse
 
Groovy concurrency
Alex Miller
 
Scaling Your Cache
Alex Miller
 
Scaling Hibernate with Terracotta
Alex Miller
 
System Integration with Akka and Apache Camel
krasserm
 
Caching In The Cloud
Alex Miller
 
Concurrency in Scala - the Akka way
Yardena Meymann
 
Java Enterprise Performance - Unburdended Applications
Lucas Jellema
 
Cold Hard Cache
Alex Miller
 
Distributed Model Validation with Epsilon
Sina Madani
 
Akka Actor presentation
Gene Chang
 
Above the clouds: introducing Akka
nartamonov
 
What’s expected in Spring 5
Gal Marder
 
Java Enterprise Edition Concurrency Misconceptions
Haim Yadid
 
Akka lsug skills matter
Skills Matter
 
Play Framework: async I/O with Java and Scala
Yevgeniy Brikman
 
Introduction to Akka - Atlanta Java Users Group
Roy Russo
 
Lazy vs. Eager Loading Strategies in JPA 2.1
Patrycja Wegrzynowicz
 
State Machine Workflow: Esoteric Techniques & Patterns Everyone Should Buy pr...
European SharePoint Conference
 
Practicing Continuous Deployment
zeeg
 
Utilizing the OpenNTF Domino API
Oliver Busse
 
Ad

Similar to SBJUG - Building Beautiful Batch Jobs (20)

PPTX
Top 5 Java Performance Metrics, Tips & Tricks
AppDynamics
 
PPTX
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
PPTX
SCUG.DK - Automation Strategy - April 2015
Ronni Pedersen
 
PPTX
StatSever-Samza: Near Real-Time Analytics
Chang-Ming Tsai
 
PPTX
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
R3
 
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Startupfest
 
PDF
Thomas Jensen. Machine Learning
Volha Banadyseva
 
PPTX
Getting data into microsoft dynamics crm faster
Daniel Cai
 
PPTX
Converting Your Legacy Data to S1000D
dclsocialmedia
 
PPT
Praxistaugliche notes strategien 4 cloud
Roman Weber
 
PPTX
Angular Ivy- An Overview
Jalpesh Vadgama
 
PDF
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
Sri Ambati
 
PPTX
Managing Performance Globally with MySQL
Daniel Austin
 
PDF
5 Amazing Reasons DBAs Need to Love Extended Events
Jason Strate
 
PPTX
Building a devops CMDB
Jaime Valero de Bernabé
 
PPT
Cloud Computing with .Net
Wesley Faler
 
PDF
Operational-Analytics
Niloy Mukherjee
 
PDF
Performance Oriented Design
Rodrigo Campos
 
PDF
The Diabolical Developers Guide to Performance Tuning
jClarity
 
PDF
Gearman - Northeast PHP 2012
Mike Willbanks
 
Top 5 Java Performance Metrics, Tips & Tricks
AppDynamics
 
Boosting the Performance of your Rails Apps
Matt Kuklinski
 
SCUG.DK - Automation Strategy - April 2015
Ronni Pedersen
 
StatSever-Samza: Near Real-Time Analytics
Chang-Ming Tsai
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
R3
 
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Startupfest
 
Thomas Jensen. Machine Learning
Volha Banadyseva
 
Getting data into microsoft dynamics crm faster
Daniel Cai
 
Converting Your Legacy Data to S1000D
dclsocialmedia
 
Praxistaugliche notes strategien 4 cloud
Roman Weber
 
Angular Ivy- An Overview
Jalpesh Vadgama
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
Sri Ambati
 
Managing Performance Globally with MySQL
Daniel Austin
 
5 Amazing Reasons DBAs Need to Love Extended Events
Jason Strate
 
Building a devops CMDB
Jaime Valero de Bernabé
 
Cloud Computing with .Net
Wesley Faler
 
Operational-Analytics
Niloy Mukherjee
 
Performance Oriented Design
Rodrigo Campos
 
The Diabolical Developers Guide to Performance Tuning
jClarity
 
Gearman - Northeast PHP 2012
Mike Willbanks
 
Ad

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 

SBJUG - Building Beautiful Batch Jobs

  • 1. Building Beautiful Batch Jobs ! Who says batch jobs can’t be beautiful code? SouthBay JVM User Group (SBJUG) Meetup - November 2013
  • 2. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 3. About me • Software Engineer • Worked on complex integration projects – CSIS, LAPD, UCLA • Worked on one high traffic system – Napster • Currently at Dealer.com • Fascinated by all things Engineering
  • 4. Dealer.com • Leader in Automotive Marketing • 10K+ clients, 12K+ Websites • CRM is our new product offering • It’s definitely a great place to work. I’d recommend it to a friend.
  • 5. Believe it or not – these are actually Dealer.com’s Core Values
  • 6. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 7. Background • Lack of frameworks for Java-based batch processing • Proliferation of many one-off, in-house solutions • SpringSource and Accenture changed this • June 2008 – production version of Spring Batch • Spring Batch is the only open source framework that provides a robust, enterprise-scale solution • Batch Application for Java Platform is coming soon (JSR 352)
  • 8. Usage Scenario A typical batch program reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form • • • • • • Commit batch process periodically Sequential processing of dependent steps Partial processing: skip records Concurrent batch processing Massively parallel batch processing Manual or scheduled restart after failure
  • 9. Domain Language of a Batch • Job • Step • Item Reader - • Item Processor • Item Writer - • • • • - Job Launcher Job Repository Job Instance Job Execution has one to many steps has item reader, processor or writer an abstraction that represents the retrieval of input for a Step, one item at a time an abstraction that represents the business processing of an item an abstraction that represents the output of a Step, chunk of items at a time launches jobs store metadata about currently running jobs an instance of a job with its unique parameters an execution attempt of a job instance
  • 11. Job, Job Instance, Job Execution
  • 16. Job – Chunk Oriented Processing
  • 17. Item Readers and Writers - Out of the box Item Readers Item Writers AmqpItemReader AmqpItemWriter FlatFileItemReader CompositeItemWriter HibernateCursorItemReader FlatFileItemWriter HibernatePagingItemReader GemfireItemWriter IbatisPagingItemReader HibernateItemWriter ItemReaderAdapter IbatisBatchItemWriter JdbcCursorItemReader ItemWriterAdapter JdbcPagingItemReader JdbcBatchItemWriter JmsItemReader JmsItemWriter JpaPagingItemReader JpaItemWriter ListItemReader MimeMessageItemWriter MongoItemReader MongoItemWriter Neo4jItemReader Neo4jItemWriter RepositoryItemReader RepositoryItemWriter StoredProcedureItemReader PropertyExtractingDelegatingItemWriter StaxEventItemReader StaxEventItemWriter
  • 19. Let’s look at a couple of examples of building simple Spring Batch Jobs Example 1 – Load Flat file contents into database Example 2 – Load XML file contents into database
  • 20. Configure DataSource and Spring Batch Core Beans spring-batch-context.xml :
  • 21. Example1: Load Flat file contents into database PERSON Table person-data.csv Jill,Doe Joe,Doe Justin,Doe Jane,Doe John,Doe PERSON_ID 1 JILL DOE 2 JOE DOE 3 Transform Data to Upper Case FIRST_NAME LAST_NAME JUSTIN DOE 4 JANE DOE 5 JOHN DOE
  • 22. Example1: Job Config flat-file-reader-job.xml Chunk Processing: • Reader – retrieves input for a Step one item at a time • Processor – processes an item • Writer – writes the output, one item or chunk of items at a time
  • 23. Example1: Reader, Processor and Writer flat-file-reader-job.xml (cont..d)
  • 25. Example1: Test Case to Execute Flat File Reader Job
  • 26. Example2: Load XML file contents into database record-data.xml AD_PERFORMANCE Table ID DATE IMPRESSION CLICKS EARNING 1 06/01/2013 139237 57 220.90 2 06/02/2013 339100 57 320.88 3 06/03/2013 431436 57 27.80
  • 28. Example2: Reader, JAXB Unmarshaller, Processor and Writer xml-file-reader-job.xml (cont..d)
  • 31. Example2: Test Case to Execute XML File Reader Job
  • 34. Jobs
  • 38. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 39. Business Problem • CRM entering Dealer's day-to-day Operations • We need to Pull data from Dealer’s DMS systems into CRM • DMS Systems can be ADP or Reynolds or DealerTrack etc
  • 40. Here’s a Small Big Picture Dealer’s DMS Systems Dealer.com’s DMS & CRM Systems ADP Extract Reynolds DealerTrack DMS Load CRM
  • 41. Typical Batch Job • Download data from DMS Provider for a dealership • Load the data in CRM • Generate report on how the data was processed
  • 42. ADP Vehicle Sales ETL Job Configuration
  • 43. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 44. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 45. Pull Frequently • We have 100s of Dealerships, so each batch Job has to be run for a Dealer’s ADP Account • We schedule Jobs for each dealership to pull every 4 hours • The Job Scheduling is managed via a centralized DDC Scheduling Server – Clients issue scheduling requests via a command queue to the server – The server will then fire scheduled events back onto a queue for clients to consume – Clients and DDC Scheduling Server communicate through a single rabbit exchange. Each client is chooses an unique application key and binds to this exchange to receive messages about its scheduled events – Named ClockTower: it’s worth a separate talk in itself
  • 46. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 47. Job Concurrency • 100s of scheduled or manually initiated jobs can all go off at the same time • We want to control how many jobs should run in our Cluster concurrently • We used basic queuing to solve this – all job commands go into a queue – they get processed one at a time – we can control how many consumers we want to allow across the cluster • We use Spring Integration AMQP OutBound & InBound Adapters
  • 48. Running Jobs Concurrently – Competing Consumer Pattern DMS Service 01 Job1 Scheduled and Manually Initiated Job Commands come through the same Queue DMS Pull Job Queue Job5 Job4 Job3 DMS Service 02 Job2 • Each Node is configured with multiple concurrent Consumers (3 as of now) • As we take more Tenants we could scale horizontally by adding more Nodes
  • 49. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 50. Data Flow Control ADP Extract DMS Load CRM • We need to control the load we put on the CRM system • We don't want to EVER load too much data at the same time • We debated two ways to solve this – Synchronous – Asynchronous (via Queues)
  • 51. Sync vs Async Loading Data into CRM CRM Batch 01 DMS Service 01 Job1 CRM Batch Service Load Balancer (SYNC) CRM Batch 02 CRM Batch 01 DMS Service 01 (ASYNC) DMS Data Load Queue Job1 CRM Batch 02
  • 52. Synchronous • Haproxy load balancer - cannot be scaled dynamically • Remote call needs to be made via REST or Spring Remoting API - tightly coupled • Client has to fail the batch job or retry the request on failure - not fault tolerant • Nodes need to throttle the number of incoming requests (via tomcat threads) – have to administer tomcat threads, nodes cannot be repurposed Asynchronous • AMQP Rabbit Queue - can be scaled dynamically • Only contract is the 'message' being passed – some what loosely coupled • If a node fails, message will be unacknowledged and another node will execute the same request - fault tolerant • Each node can control the number of concurrent queue consumers – application configuration, nodes can be purposed • It does incur some extra cost, message persistence & dynamic reply queues - extra cost We settled on loading via Queue using Spring Integration AMQP Gateways (which are Bi-Directional), the call waits for response to come back via reply queue
  • 53. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 54. We send out an awesome looking email notification to an internal mailing list
  • 55. The CSV Report has Detailed information how each row was processed
  • 56. We are working towards a UI that’ll look like this
  • 57. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 58. Job Resiliency ADP Extract DMS Load CRM 100s of Jobs could go off at the same time and jobs need to be resilient to unexpected failures • While a big job is running, CRM could crash or get restarted for deployment • While a big job is running, DMS could crash or get restarted for deployment In such cases, we want to rerun the job after a short while from where it left off. • We use Spring Batch’s Job Restart-ability feature to achieve this
  • 59. What could go wrong? DMS Service 01 Job1 CRM Batch 01 X DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X X  Nodes that could just crash or could be restarted due to a deployment – when a big job is running. Our goal is to be able to rerun the job, and resume from where left things left off. X
  • 60. Spring Batch – Restartability • Spring Batch maintains Job State in the database – which Step is completed, being processed or failed – Which item is being processed when Chunk processing • Jobs can be restarted using the Job ExecutionId • Spring Batch will skip over the steps and run the job from where it left off before • If the job had failed during Chunk processing it’ll skip processing the items that were already processed and start from where it left off before
  • 61. When CRM goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue X DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 • We have a timeout of 5 minutes for the reply from CRM • When CRM Batch Nodes are down, we’ll get a timeout Exception, which results in a new Job Command Message to the DMS Pull Job Queue • The message includes the JobExecutionId • Which ever node picks up the message will resume the job from where it left off X
  • 62. When a DMS Service Node goes down DMS Service 01 CRM Batch 01 Job1 DMS Pull Job Queue DMS Data Load Queue DMS Service 02 Job2 CRM Batch 02 X • When a DMS Node executing the Job goes down, the message will be unacknowledged, and will be picked up by any other node connected to the DMS Pull Job Queue • The node that picks up the message will inspect if this job was already running and stopped abruptly, and if so it’ll try to resume it from where it left off • (This is not in production yet, its under development)
  • 63. Some requirements • • • • • • We need to pull frequently for a lot of dealers We need job concurrency control We need data flow control for loading into CRM We need to know how the data was processed We need job resiliency We need beautiful batch jobs 
  • 64. So, what makes it beautiful? • Simple – We just used the basic features of Spring Batch • Easy to understand – Quick look at spring configurations is all you need • Less code – We focused on the business logic • Low maintenance – Anybody can maintain it
  • 65. In this tech talk we’ll cover • Spring Batch Introduction • Dealer.com Real World Usage • Some Lessons Learned
  • 66. On Spring Batch • • • • • • Really easy to setup and user Highly configurable Chunk Processing is the bomb! Beware of the commit count The bean ‘step’ scope comes in handy ExecutionContext is limited to 4 data types
  • 67. On 3rd Party Integration • • • • • • Plan for Dev & Live accounts and environments Configure anything and everything possible Download large files via streaming Handle exceptions properly Embrace data translation errors Build jobs that are repeat runnable
  • 68. Sources • Spring Batch Reference Documentation – http://docs.spring.io/spring-batch/reference/html-single/index.html • Ad Performance Sample XML taken from – http://www.mkyong.com/spring-batch/spring-batch-example-xml-fileto-database/
  • 70. Shameless Plug Currently we have a few openings in the Manhattan Beach office • Java Developers • UI Developers • Web Developers If interested please apply at http://careers.dealer.com/