SlideShare a Scribd company logo
Continuous Integration
on top of hadoop
Wisely Chen & Neal Lee
Tuesday, June 11, 13
Agenda
• Who I am
• Problem
• Solution
• Demo
• Q&A
Tuesday, June 11, 13
Who I am
• Wisely Chen ( thegiive@gmail.com )
• Release manager of Yahoo![Taiwan] shopping and data team
• Love to promote open source tech at Taiwan
• Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007
• Puppet : PHPConf 2012 , RubyConf 2012
• Release Practice :Webconf 2013, Coscup 2012
Tuesday, June 11, 13
Who I am
• Neal Lee (@neal_lee)
• Data Engineer at Yahoo![Taiwan]
• Aiming to build up an easy use of self-service BI
platform connecting to Hadoop.
Tuesday, June 11, 13
Story 1
Tuesday, June 11, 13
Another Story
Tuesday, June 11, 13
Yet Another Story
Tuesday, June 11, 13
Solution
Tuesday, June 11, 13
One click
• Manual commit code to SCM
• And DONE
• Auto unit testing
• Auto push beta for performance testing
• Auto push to production grid
• Auto trigger code
Tuesday, June 11, 13
This feeling is 爽!
Tuesday, June 11, 13
Continuous Integration
Tuesday, June 11, 13
Continuous Integration
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Tuesday, June 11, 13
We focus on
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
CI
Master
1. Commit
Code
SCM
3. Call
5. Notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
5. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
8.Notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
9. Notify
user
Tuesday, June 11, 13
Unit Test
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
PigUnit
• A simple xUnit framework
• No cluster set up is required in local mode
• Unit testing, regression testing, and rapid
prototyping on the fly
Tuesday, June 11, 13
Using PigUnit
• Coding
• Write PigUnit test case
• Run local PigUnit test
• Push to grid
• Run Pig on grid
• Get right result !
Tuesday, June 11, 13
Unit test is live doc
• Unit test is runnable live doc
• Pass test case and meet previous
requirement
Tuesday, June 11, 13
Performance Test
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
Vaidya
• Rule based performance diagnosis of M/R jobs
• Extensible framework
• You can add your own rules
• Write complex rules using existing rules
Tuesday, June 11, 13
CI toolset
CI slave
exec local
UnitTest
CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
notify CI
deploy deploy
CI
Commit
Code
SCM
Call
Vaidya
BASH
Tuesday, June 11, 13
CI is flexible
• MapReduce can use MapUnit
• Hive can use hive_test
• Pig can use PigUnit
Tuesday, June 11, 13
Github trigger CI
Tuesday, June 11, 13
CI testing build pipeline
Tuesday, June 11, 13
Testing Trend
Tuesday, June 11, 13
DEMO
Tuesday, June 11, 13
Conclusion
• Auto testing will save your life
• CI will boost your productivity
• This process can feed in any platform
Tuesday, June 11, 13
謝謝大家
Tuesday, June 11, 13

More Related Content

What's hot (20)

PDF
Karim Fanadka
CodeFest
 
PPTX
Smarter deployments with octopus deploy
Thibaud Gravrand
 
PPTX
Building Web Apps in Ratpack
Daniel Woods
 
PDF
Intro to Ratpack (CDJDN 2015-01-22)
David Carr
 
KEY
Make It Cooler: Using Decentralized Version Control
indiver
 
PDF
There's more to Ratpack than non-blocking
Marcin Erdmann
 
PDF
Automated acceptance test
Bryan Liu
 
PPTX
Releasing High Quality Packages - Longhorn PHP 2021
Colin O'Dell
 
PDF
Continuous delivery - tools and techniques
Mike McGarr
 
PDF
Continuous Delivery - Devoxx Morocco 2016
Rafał Leszko
 
PDF
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Rafał Leszko
 
PDF
Safe deployments with Blue-Green and Spinnaker
Mihnea Dobrescu-Balaur
 
PPTX
Speed up your regression and reduce cost load with Selenoid + K8s + ReportPortal
Danylo Kuvshynov
 
PPTX
dotnetsheff: Continuous delivery with Team City and Octopus Deploy
Kevin Kuszyk
 
PDF
Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...
Donny Wals
 
PDF
DevOps 及 TDD 開發流程哲學
謝 宗穎
 
PDF
Ratpack Web Framework
Daniel Woods
 
PDF
Continuous delivery in Qbon
Jaric Kuo
 
PDF
Perl Continous Integration
Michael Peters
 
PDF
Using Docker for Testing
Mukta Aphale
 
Karim Fanadka
CodeFest
 
Smarter deployments with octopus deploy
Thibaud Gravrand
 
Building Web Apps in Ratpack
Daniel Woods
 
Intro to Ratpack (CDJDN 2015-01-22)
David Carr
 
Make It Cooler: Using Decentralized Version Control
indiver
 
There's more to Ratpack than non-blocking
Marcin Erdmann
 
Automated acceptance test
Bryan Liu
 
Releasing High Quality Packages - Longhorn PHP 2021
Colin O'Dell
 
Continuous delivery - tools and techniques
Mike McGarr
 
Continuous Delivery - Devoxx Morocco 2016
Rafał Leszko
 
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Rafał Leszko
 
Safe deployments with Blue-Green and Spinnaker
Mihnea Dobrescu-Balaur
 
Speed up your regression and reduce cost load with Selenoid + K8s + ReportPortal
Danylo Kuvshynov
 
dotnetsheff: Continuous delivery with Team City and Octopus Deploy
Kevin Kuszyk
 
Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...
Donny Wals
 
DevOps 及 TDD 開發流程哲學
謝 宗穎
 
Ratpack Web Framework
Daniel Woods
 
Continuous delivery in Qbon
Jaric Kuo
 
Perl Continous Integration
Michael Peters
 
Using Docker for Testing
Mukta Aphale
 

Viewers also liked (20)

PPTX
Road to sbt 1.0 paved with server
Eugene Yokota
 
PPTX
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Eugene Yokota
 
PDF
SBT Crash Course
Michal Bigos
 
PDF
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
 
PDF
Productionizing Spark and the Spark Job Server
Evan Chan
 
PDF
Big Data Architecture and Deployment
Cisco Canada
 
PPTX
Devoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery Analytics
Luca Milanesio
 
PDF
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Spark Summit
 
PDF
Understanding Data Consistency in Apache Cassandra
DataStax
 
PDF
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
PPTX
Hadoop on Docker
Rakesh Saha
 
PPTX
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB
 
PPT
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
PPTX
Dealing with Changed Data in Hadoop
DataWorks Summit
 
PDF
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
PDF
Not Your Father's Database by Vida Ha
Spark Summit
 
PPT
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
PDF
Enterprise Docker
Lee Ji Eun
 
Road to sbt 1.0 paved with server
Eugene Yokota
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Eugene Yokota
 
SBT Crash Course
Michal Bigos
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
 
Productionizing Spark and the Spark Job Server
Evan Chan
 
Big Data Architecture and Deployment
Cisco Canada
 
Devoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery Analytics
Luca Milanesio
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Spark Summit
 
Understanding Data Consistency in Apache Cassandra
DataStax
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
Hadoop on Docker
Rakesh Saha
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB
 
Kafka Reliability - When it absolutely, positively has to be there
Gwen (Chen) Shapira
 
Dealing with Changed Data in Hadoop
DataWorks Summit
 
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Not Your Father's Database by Vida Ha
Spark Summit
 
Running Spark in Production
DataWorks Summit/Hadoop Summit
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
RTTS
 
Enterprise Docker
Lee Ji Eun
 
Ad

Similar to Hadoop Summit 2013 : Continuous Integration on top of hadoop (20)

PPTX
Continuous Delivery - Automate & Build Better Software with Travis CI
wajrcs
 
PDF
Agile Bodensee - Testautomation & Continuous Delivery Workshop
Michael Palotas
 
PDF
The Journey Towards Continuous Integration
Sebastian Marek
 
PDF
CI : the first_step: Auto Testing with CircleCI - (MOSG)
Soshi Nemoto
 
PPTX
Continuous Integration (CI) is about more than releases
Chris Riley ☁
 
PDF
TMF2014 CI-CD Workshop Michael Palotas
KJR
 
PDF
Continuous integration (eng)
Anatoliy Okhotnikov
 
PDF
Continuous delivery from the trenches
Michael Medin
 
PDF
Pipeline as code for your infrastructure as Code
Kris Buytaert
 
PPT
Simple tools to fight bigger quality battle
Anand Ramdeo
 
PDF
Devops - Continuous Integration And Continuous Development
SandyJohn5
 
PDF
Into The Box 2018 CI Deep Dive
Ortus Solutions, Corp
 
PDF
Ruby meetup 7_years_in_testing
Digital Natives
 
PDF
Continuous Integration at Mollie
willemstuursma
 
ZIP
Continuous Integration, Build Pipelines and Continuous Deployment
Christopher Read
 
PPTX
Continuous Integration and development environment approach
Aleksandr Tsertkov
 
PDF
Introduction to Continuous Integration
Somkiat Puisungnoen
 
PPT
Continuous Integration
Joseph Wang
 
PDF
Stop Sucking at Building Stuff!
Puppet
 
Continuous Delivery - Automate & Build Better Software with Travis CI
wajrcs
 
Agile Bodensee - Testautomation & Continuous Delivery Workshop
Michael Palotas
 
The Journey Towards Continuous Integration
Sebastian Marek
 
CI : the first_step: Auto Testing with CircleCI - (MOSG)
Soshi Nemoto
 
Continuous Integration (CI) is about more than releases
Chris Riley ☁
 
TMF2014 CI-CD Workshop Michael Palotas
KJR
 
Continuous integration (eng)
Anatoliy Okhotnikov
 
Continuous delivery from the trenches
Michael Medin
 
Pipeline as code for your infrastructure as Code
Kris Buytaert
 
Simple tools to fight bigger quality battle
Anand Ramdeo
 
Devops - Continuous Integration And Continuous Development
SandyJohn5
 
Into The Box 2018 CI Deep Dive
Ortus Solutions, Corp
 
Ruby meetup 7_years_in_testing
Digital Natives
 
Continuous Integration at Mollie
willemstuursma
 
Continuous Integration, Build Pipelines and Continuous Deployment
Christopher Read
 
Continuous Integration and development environment approach
Aleksandr Tsertkov
 
Introduction to Continuous Integration
Somkiat Puisungnoen
 
Continuous Integration
Joseph Wang
 
Stop Sucking at Building Stuff!
Puppet
 
Ad

Recently uploaded (20)

PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Designing Production-Ready AI Agents
Kunal Rai
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
July Patch Tuesday
Ivanti
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Biography of Daniel Podor.pdf
Daniel Podor
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

Hadoop Summit 2013 : Continuous Integration on top of hadoop

  • 1. Continuous Integration on top of hadoop Wisely Chen & Neal Lee Tuesday, June 11, 13
  • 2. Agenda • Who I am • Problem • Solution • Demo • Q&A Tuesday, June 11, 13
  • 3. Who I am • Wisely Chen ( [email protected] ) • Release manager of Yahoo![Taiwan] shopping and data team • Love to promote open source tech at Taiwan • Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007 • Puppet : PHPConf 2012 , RubyConf 2012 • Release Practice :Webconf 2013, Coscup 2012 Tuesday, June 11, 13
  • 4. Who I am • Neal Lee (@neal_lee) • Data Engineer at Yahoo![Taiwan] • Aiming to build up an easy use of self-service BI platform connecting to Hadoop. Tuesday, June 11, 13
  • 9. One click • Manual commit code to SCM • And DONE • Auto unit testing • Auto push beta for performance testing • Auto push to production grid • Auto trigger code Tuesday, June 11, 13
  • 10. This feeling is 爽! Tuesday, June 11, 13
  • 12. Continuous Integration • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Tuesday, June 11, 13
  • 13. We focus on • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Tuesday, June 11, 13
  • 14. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 15. CI flow 4. CI slave exec local UnitTest CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI CI Master 1. Commit Code SCM 3. Call 5. Notify user Tuesday, June 11, 13
  • 16. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 8.Notify user Tuesday, June 11, 13
  • 17. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 9. Notify user Tuesday, June 11, 13
  • 18. Unit Test 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 19. PigUnit • A simple xUnit framework • No cluster set up is required in local mode • Unit testing, regression testing, and rapid prototyping on the fly Tuesday, June 11, 13
  • 20. Using PigUnit • Coding • Write PigUnit test case • Run local PigUnit test • Push to grid • Run Pig on grid • Get right result ! Tuesday, June 11, 13
  • 21. Unit test is live doc • Unit test is runnable live doc • Pass test case and meet previous requirement Tuesday, June 11, 13
  • 22. Performance Test 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 23. Vaidya • Rule based performance diagnosis of M/R jobs • Extensible framework • You can add your own rules • Write complex rules using existing rules Tuesday, June 11, 13
  • 24. CI toolset CI slave exec local UnitTest CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid notify CI deploy deploy CI Commit Code SCM Call Vaidya BASH Tuesday, June 11, 13
  • 25. CI is flexible • MapReduce can use MapUnit • Hive can use hive_test • Pig can use PigUnit Tuesday, June 11, 13
  • 27. CI testing build pipeline Tuesday, June 11, 13
  • 30. Conclusion • Auto testing will save your life • CI will boost your productivity • This process can feed in any platform Tuesday, June 11, 13