SlideShare a Scribd company logo
Microservices vs
Hadoop ecosystem
Marton Elek
2017 february
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservice definition
”An approach to developing a single application as a
 suite of small services, each running in its own process
 and communicating with lightweight mechanisms, often an HTTP resource
API.
 These services are built around business capabilities and independently
deployable by fully automated deployment machinery.”
– https://martinfowler.com/articles/microservices.html
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hadoop cluster
 The definition is almost true for a Hadoop cluster as well
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Dockerized Hadoop cluster
 How can we use the tools from microservice architecture in hadoop
ecosystem?
 A possible approach to install cluster (hadoop, spark, kafka, hive) based on
– separated docker containers
– Smart configuration management (using well-known tooling from microservices
architectures)
 Goal: rapid prototyping platform
 Easy switch between
– versions (official HDP, snapshot build, apache build)
– configuration (ha, kerberos, metrics, htrace…)
 Developers/Ops tool
– Easy != easy for any user without knowledge about the tool
 Not goal:
– replace current management plaforms (eg. Ambari)
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What are the Microservices (Theory)
Collection of patterns/best practices
 II. Dependencies
– Explicitly declare and isolate dependencies
 III. Config
– Store config in the environment
 VI. Processes
– Execute the app as one or more stateless processes
 VIII. Concurrency
– Scale out via the process model
 XII. Admin processes
– Run admin/management tasks as one-off processes
12 Factory apps (http://12factor.net)
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What are the Microservices (Practice)
 Spring started as a
– Dependency injection framework
 Spring Boot ecosystem
– Easy to use starter projects
– Lego bricks for various problems
• JDBC access
• Database access
• REST
• Health check
 Spring Cloud -- elements to build microservices (based on Netflix stack)
– API gateway
– Service registry
– Configuration server
– Distributed tracing
– Client side load balancing
public class TimeStarter {
@Autowired
TimeService timerService;
public Date now() {
long timeService = timerService.now();
}
}
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservices with Spring Cloud
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call
@EnableAutoConfiguration
@RestController
@ComponentScan
public class TimeStarter {
@Autowired
TimeService timerService;
@RequestMapping("/now")
public Date now() {
return timerService.now();
}
public static void main(String[] args) {
SpringApplication.run(TimeStarter.class, args);
}
}
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservice version
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
Rest call
Rest call
Rest call
Rest call
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Solution: API Gateway
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
API gateway
Rest call
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
API Gateway
 Goals: Hide available microservices behind a service facade pattern
– Routing, Authorization
– Deployment handling, Canary testing, Blue/Green deployment
– Logging, SLA, Auditing
 Implementation examples:
– Spring cloud Api Gateway (based on Netflix Zuul)
– Netflix Zuul based implementation
– Twitter Finagle based implementation
– Amazon API gateway
– Simple Nginx reverse proxy configuration
– Traefik, Kong
 Usage in Hadoop ecosystem
– For prototyping: Only if the scheduler/orchestrator starts the service on a random host
– For security: Apache Knox
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Problem: how to configure API gateway to automatically route to all the
services
auth service
timer service
upload service
report service
API gateway
Rest call
?
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Solution: Use service registry
– Components should be registered to the service registry automatically
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry
 Goal: Store the location and state of the available services
– Health check
– DNS interface
 Implementation examples:
– Spring cloud: Eureka
– Netflix eureka based implementation
– Consul.io
– etcd, zookeeper
– Simple workaround: DNS or hosts file
 Usage in Hadoop ecosystem
– Most of the components needs info about the location of nameserver(s) and other
master components
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Configuration server
 Problem: how can we configure multiple components
– ”Store config in the environment” (12factor)
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Config
?
Config
?
Config
?
Config
?
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Configuration server
 Problem: how can we configure multiple components
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Configuration
Config server
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Config server
 Goals: One common place for all of the configuration
– Versioning
– Auditing
– Multiple environment support: Use (almost) the same configuration from DEV to PROD
environment
– Solution for sensitive data
 Solution examples:
– Spring Cloud config service
– Zookeeper
– Most of the service registry have key->value store (Consul, etcd)
– Any persistence datastore (But the versioning is a question)
 For Hadoop ecosystem:
– Most painful point: the same configuration elements (eg. core-site.xml) is needed at
multiple location
– Ambari and other management tools try to solve the problem (but not with the focus of
rapid prototyping)
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Config server – configuration management
 Config server structure: [branch]/name-profile.extension
 Merge properties for name=timer and profile(environment)=dev
 URL from the config server
– http://config:8888/timer-dev.properties
• server.port=6767
• aws.secret.key=zzz
• exit.code=-23
 Local file system structure (master branch)
– timer.properties
• server.port=6767
– dev.properties
• aws.secret.key=xxx
– application.properties
• exit.code=-23
Config server
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
 Tools used in microservice architecture
 Key components:
– Config server
– Service registry
– API gateway
 Configuration server
– Versioning
– One common place to distribute configuration
– Configuration preprocessing!!!
• transformation
• the content of the configuration should be defined, it could be format
independent
• But the final configuration should be visible
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Docker based Hadoop cluster
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Configuration server
2. Service registry
3. API gatway
Microservice architecture elements
How to do it with Hadoop?
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Configuration server
2. Service registry
3. API gatway
4. +1 Packaging
Microservice architecture elements
Do it with Hadoop
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Packaging: Docker
 Packaging: Docker
– Docker Engine:
• a portable,
• lightweight runtime and
• packaging tool
– Docker Hub,
• a cloud service for sharing applications
– Docker Compose:
• Predefined recipes (environment variables, network, …)
 My docker containers: http://hub.docker.com/elek/
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Docker decisions
 One application per container
– More flexible
– More simple (configuration preprocess + start)
– One deployable unit
 Microservice-like: prefer more similar units against smaller but bigger one
 Using host network for clusters
10.8.0.5
172.13.0.1
172.13.0.5
172.13.0.2
10.8.0.6
172.13.0.3
172.13.0.4
172.13.0.9
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.6
10.8.0.6
10.8.0.6
10.8.0.6
Host networkBridge network
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Repositories
 elek/bigdata-docker:
– example configuration
– docker-compose files
– ansible scripts
– getting started
entrypoint
 elek/docker-bigdata-base (base image for all the containers)
– Contains all the configuration loading (and some documentation)
– Use CONFIG_TYPE environment variable to select configuration method
• CONFIG_TYPE=simple (configuration from environment variables – for local env)
• CONFIG_TYPE=consul (configuration from consul – for distributed environment)
 elek/docker-…. (hadoop/spark/hive/...)
– Docker images for the components
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Local demo
 Local run, using host network
– More configuration is needed
– Auto scaling is supported
– https://github.com/elek/bigdata-docker/tree/master/compose
bridge network
172.13.0.1
172.13.0.5
172.13.0.2
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Packaging
2. Configuration server
3. Service registry
4. API gateway
Components
Do it with Hadoop
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry/configuration server
 Service registry
– Health check support
– DNS support
 Key-value store
– Binary data is supported
 Based on agents and servers
 Easy to use REST API
 RAFT based consensus protocol
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Service registry/configuration server
 Git2Consul
– Mirror git repositories to
consul
 Consul template
– Advanced Template engine
– Renders a template
(configuration file) based on
the information from the
consul
– Run/restart a process on
change
 Registrator
– Listen on docker event
stream
– Register new components to
consul
hdfs-namenode
Consul
Configuration (git)
datanode
datanode
datanode
hdfs-datanode
consul-template
git2consul
Registrator
docker event
stream
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Weave scope
 Agents to monitor
– network connections between components
– cpu
– memory
 Supports Docker, Swarm, Weave network, …
 Easy install
 Transparent
 Pluggable
 Only problems:
– Temporary docker containers
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Distributed demo
 Distributed run with host network
– https://github.com/elek/bigdata-docker/tree/master/consul
– Configuration is hosted in a consul instance
– Dynamic update
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
TODO
 More profiles and configuration set
– Ready to use kerberos/HA environments
– On the fly keytab/keystore generation (?)
 Scripting/tool improvement
– Autorestart in case of service registration change
 Configuration for more orcherstration/scheduling
– Nomad?
– Docker Swarm?
 Easy image creation for specific builds
 Improve docker images
– Predefined volume/port definition
– Consolidate default values
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank You

More Related Content

What's hot (20)

PPTX
Mulesoft Anypoint platform introduction
gijish
 
PDF
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
Tatsuo Kudo
 
PPTX
Clínica SGBD
Miguel Oliveira
 
PDF
Fluentd and Kafka
N Masahiro
 
PDF
Elevating Application Performance with the latest IBM COBOL offerings
DevOps for Enterprise Systems
 
PPTX
Oauth 2.0
Manish Kumar Singh
 
PPTX
Azure automation
Tariq Younas
 
PDF
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
DOCX
Low level design template (1)
anosha jamshed
 
PPTX
Introduction to Mulesoft
venkata20k
 
PDF
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK
 
PDF
Apache Airflow Architecture
Gerard Toonstra
 
PPTX
B2B EDI Formats and MuleSoft X12 Connector
Vikalp Bhalia
 
KEY
Event Driven Architecture
Chris Patterson
 
PPTX
Keycloak for Science Gateways - SGCI Technology Sampler Webinar
marcuschristie
 
PPTX
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
PPTX
Kafka Connect - debezium
Kasun Don
 
PDF
Sistemas Distribuídos - Publish-Subscribe - Kafka
Natã Melo
 
PPTX
Splunk Ninjas: New Features and Search Dojo
Splunk
 
PDF
Frequently asked MuleSoft Interview Questions and Answers from Techlightning
Arul ChristhuRaj Alphonse
 
Mulesoft Anypoint platform introduction
gijish
 
FAPI (Financial-grade API) and CIBA (Client Initiated Backchannel Authenticat...
Tatsuo Kudo
 
Clínica SGBD
Miguel Oliveira
 
Fluentd and Kafka
N Masahiro
 
Elevating Application Performance with the latest IBM COBOL offerings
DevOps for Enterprise Systems
 
Azure automation
Tariq Younas
 
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
Low level design template (1)
anosha jamshed
 
Introduction to Mulesoft
venkata20k
 
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK
 
Apache Airflow Architecture
Gerard Toonstra
 
B2B EDI Formats and MuleSoft X12 Connector
Vikalp Bhalia
 
Event Driven Architecture
Chris Patterson
 
Keycloak for Science Gateways - SGCI Technology Sampler Webinar
marcuschristie
 
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Kafka Connect - debezium
Kasun Don
 
Sistemas Distribuídos - Publish-Subscribe - Kafka
Natã Melo
 
Splunk Ninjas: New Features and Search Dojo
Splunk
 
Frequently asked MuleSoft Interview Questions and Answers from Techlightning
Arul ChristhuRaj Alphonse
 

Viewers also liked (20)

PDF
Deep learning - Part I
QuantUniversity
 
PDF
Deep learning and Apache Spark
QuantUniversity
 
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
PPTX
Deep learning Tutorial - Part II
QuantUniversity
 
PPTX
Ansible + Hadoop
Michael Young
 
PDF
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
PPTX
Top 5 Deep Learning Stories 2/24
NVIDIA
 
PPTX
Tugas 4 0317-imelda felicia-1412510545
imeldafelicia
 
PPTX
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
PPTX
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Edw Optimization Solution
Hortonworks
 
PDF
2015 Internet Trends Report
IQbal KHan
 
PDF
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
PDF
Web engineering notes unit 3
inshu1890
 
PDF
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon
 
PDF
Real-time Analytics in Financial: Use Case, Architecture and Challenges
DataWorks Summit/Hadoop Summit
 
PPTX
SQL Server on Linux - march 2017
Sorin Peste
 
PPTX
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
MongoDB
 
PDF
How to Become a Thought Leader in Your Niche
Leslie Samuel
 
Deep learning - Part I
QuantUniversity
 
Deep learning and Apache Spark
QuantUniversity
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
Deep learning Tutorial - Part II
QuantUniversity
 
Ansible + Hadoop
Michael Young
 
Introduction to Deep Learning (NVIDIA)
Rakuten Group, Inc.
 
Top 5 Deep Learning Stories 2/24
NVIDIA
 
Tugas 4 0317-imelda felicia-1412510545
imeldafelicia
 
Top 5 Strategies for Retail Data Analytics
Hortonworks
 
Dynamic Column Masking and Row-Level Filtering in HDP
Hortonworks
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Edw Optimization Solution
Hortonworks
 
2015 Internet Trends Report
IQbal KHan
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Web engineering notes unit 3
inshu1890
 
Google Dev Summit Extended Seoul - TensorFlow: Tensorboard & Keras
Taegyun Jeon
 
Real-time Analytics in Financial: Use Case, Architecture and Challenges
DataWorks Summit/Hadoop Summit
 
SQL Server on Linux - march 2017
Sorin Peste
 
Seminario Web MongoDB-Paradigma: Cree aplicaciones más escalables utilizando ...
MongoDB
 
How to Become a Thought Leader in Your Niche
Leslie Samuel
 
Ad

Similar to Micro services vs hadoop (20)

PPTX
Microservices deck
Raja Chattopadhyay
 
PDF
Building Microservices Software practics
muhammed84essa
 
PPTX
Springboot Microservices
NexThoughts Technologies
 
ODP
Developing Microservices using Spring - Beginner's Guide
Mohanraj Thirumoorthy
 
PDF
Microservices on a budget meetup
Matthew Reynolds
 
PDF
Full lifecycle of a microservice
Luigi Bennardis
 
PPTX
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
PDF
Microservices - not just with Java
Eberhard Wolff
 
PDF
Spring Microservices In Action 1st Edition John Carnell
kcjsuwx115
 
PDF
Microservices architecture: practical aspects
Antonio Sagliocco
 
PPTX
Microservice architecture
Touraj Ebrahimi
 
PDF
Microservices Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
ODP
Microservices Patterns and Anti-Patterns
Corneil du Plessis
 
PDF
The Need of Cloud-Native Application
Emiliano Pecis
 
PPTX
Microservices pros and cons
Andrew Siemer
 
PPTX
Intro to spring cloud &microservices by Eugene Hanikblum
Eugene Hanikblum
 
PDF
Микросервисы со Spring Boot & Spring Cloud
Vitebsk DSC
 
PDF
Production-Ready_Microservices_excerpt.pdf
ajcob123
 
PDF
Resilient Microservices with Spring Cloud
VMware Tanzu
 
PPTX
Event Bus as Backbone for Decoupled Microservice Choreography - Lecture and W...
Lucas Jellema
 
Microservices deck
Raja Chattopadhyay
 
Building Microservices Software practics
muhammed84essa
 
Springboot Microservices
NexThoughts Technologies
 
Developing Microservices using Spring - Beginner's Guide
Mohanraj Thirumoorthy
 
Microservices on a budget meetup
Matthew Reynolds
 
Full lifecycle of a microservice
Luigi Bennardis
 
Ultimate Guide to Microservice Architecture on Kubernetes
kloia
 
Microservices - not just with Java
Eberhard Wolff
 
Spring Microservices In Action 1st Edition John Carnell
kcjsuwx115
 
Microservices architecture: practical aspects
Antonio Sagliocco
 
Microservice architecture
Touraj Ebrahimi
 
Microservices Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Microservices Patterns and Anti-Patterns
Corneil du Plessis
 
The Need of Cloud-Native Application
Emiliano Pecis
 
Microservices pros and cons
Andrew Siemer
 
Intro to spring cloud &microservices by Eugene Hanikblum
Eugene Hanikblum
 
Микросервисы со Spring Boot & Spring Cloud
Vitebsk DSC
 
Production-Ready_Microservices_excerpt.pdf
ajcob123
 
Resilient Microservices with Spring Cloud
VMware Tanzu
 
Event Bus as Backbone for Decoupled Microservice Choreography - Lecture and W...
Lucas Jellema
 
Ad

Recently uploaded (20)

PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 

Micro services vs hadoop

  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservice definition ”An approach to developing a single application as a  suite of small services, each running in its own process  and communicating with lightweight mechanisms, often an HTTP resource API.  These services are built around business capabilities and independently deployable by fully automated deployment machinery.” – https://martinfowler.com/articles/microservices.html
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hadoop cluster  The definition is almost true for a Hadoop cluster as well
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Dockerized Hadoop cluster  How can we use the tools from microservice architecture in hadoop ecosystem?  A possible approach to install cluster (hadoop, spark, kafka, hive) based on – separated docker containers – Smart configuration management (using well-known tooling from microservices architectures)  Goal: rapid prototyping platform  Easy switch between – versions (official HDP, snapshot build, apache build) – configuration (ha, kerberos, metrics, htrace…)  Developers/Ops tool – Easy != easy for any user without knowledge about the tool  Not goal: – replace current management plaforms (eg. Ambari)
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What are the Microservices (Theory) Collection of patterns/best practices  II. Dependencies – Explicitly declare and isolate dependencies  III. Config – Store config in the environment  VI. Processes – Execute the app as one or more stateless processes  VIII. Concurrency – Scale out via the process model  XII. Admin processes – Run admin/management tasks as one-off processes 12 Factory apps (http://12factor.net)
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What are the Microservices (Practice)  Spring started as a – Dependency injection framework  Spring Boot ecosystem – Easy to use starter projects – Lego bricks for various problems • JDBC access • Database access • REST • Health check  Spring Cloud -- elements to build microservices (based on Netflix stack) – API gateway – Service registry – Configuration server – Distributed tracing – Client side load balancing public class TimeStarter { @Autowired TimeService timerService; public Date now() { long timeService = timerService.now(); } }
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservices with Spring Cloud
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monolith application  Monolith but modular application example auth service timer service upload service report service Rest call
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Monolith application  Monolith but modular application example auth service timer service upload service report service Rest call @EnableAutoConfiguration @RestController @ComponentScan public class TimeStarter { @Autowired TimeService timerService; @RequestMapping("/now") public Date now() { return timerService.now(); } public static void main(String[] args) { SpringApplication.run(TimeStarter.class, args); } }
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Microservice version  First problem: how can we find the right backend port form the frontend? auth service timer service upload service report service Rest call Rest call Rest call Rest call
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Solution: API Gateway  First problem: how can we find the right backend port form the frontend? auth service timer service upload service report service API gateway Rest call
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved API Gateway  Goals: Hide available microservices behind a service facade pattern – Routing, Authorization – Deployment handling, Canary testing, Blue/Green deployment – Logging, SLA, Auditing  Implementation examples: – Spring cloud Api Gateway (based on Netflix Zuul) – Netflix Zuul based implementation – Twitter Finagle based implementation – Amazon API gateway – Simple Nginx reverse proxy configuration – Traefik, Kong  Usage in Hadoop ecosystem – For prototyping: Only if the scheduler/orchestrator starts the service on a random host – For security: Apache Knox
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Problem: how to configure API gateway to automatically route to all the services auth service timer service upload service report service API gateway Rest call ?
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Solution: Use service registry – Components should be registered to the service registry automatically auth service timer service upload service report service Rest call Service registry API gateway
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry  Goal: Store the location and state of the available services – Health check – DNS interface  Implementation examples: – Spring cloud: Eureka – Netflix eureka based implementation – Consul.io – etcd, zookeeper – Simple workaround: DNS or hosts file  Usage in Hadoop ecosystem – Most of the components needs info about the location of nameserver(s) and other master components
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Configuration server  Problem: how can we configure multiple components – ”Store config in the environment” (12factor) auth service timer service upload service report service Rest call Service registry API gateway Config ? Config ? Config ? Config ?
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Configuration server  Problem: how can we configure multiple components auth service timer service upload service report service Rest call Service registry API gateway Configuration Config server
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Config server  Goals: One common place for all of the configuration – Versioning – Auditing – Multiple environment support: Use (almost) the same configuration from DEV to PROD environment – Solution for sensitive data  Solution examples: – Spring Cloud config service – Zookeeper – Most of the service registry have key->value store (Consul, etcd) – Any persistence datastore (But the versioning is a question)  For Hadoop ecosystem: – Most painful point: the same configuration elements (eg. core-site.xml) is needed at multiple location – Ambari and other management tools try to solve the problem (but not with the focus of rapid prototyping)
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Config server – configuration management  Config server structure: [branch]/name-profile.extension  Merge properties for name=timer and profile(environment)=dev  URL from the config server – http://config:8888/timer-dev.properties • server.port=6767 • aws.secret.key=zzz • exit.code=-23  Local file system structure (master branch) – timer.properties • server.port=6767 – dev.properties • aws.secret.key=xxx – application.properties • exit.code=-23 Config server
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Summary  Tools used in microservice architecture  Key components: – Config server – Service registry – API gateway  Configuration server – Versioning – One common place to distribute configuration – Configuration preprocessing!!! • transformation • the content of the configuration should be defined, it could be format independent • But the final configuration should be visible
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Docker based Hadoop cluster
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Configuration server 2. Service registry 3. API gatway Microservice architecture elements How to do it with Hadoop?
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Configuration server 2. Service registry 3. API gatway 4. +1 Packaging Microservice architecture elements Do it with Hadoop
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Packaging: Docker  Packaging: Docker – Docker Engine: • a portable, • lightweight runtime and • packaging tool – Docker Hub, • a cloud service for sharing applications – Docker Compose: • Predefined recipes (environment variables, network, …)  My docker containers: http://hub.docker.com/elek/
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Docker decisions  One application per container – More flexible – More simple (configuration preprocess + start) – One deployable unit  Microservice-like: prefer more similar units against smaller but bigger one  Using host network for clusters 10.8.0.5 172.13.0.1 172.13.0.5 172.13.0.2 10.8.0.6 172.13.0.3 172.13.0.4 172.13.0.9 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.6 10.8.0.6 10.8.0.6 10.8.0.6 Host networkBridge network
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Repositories  elek/bigdata-docker: – example configuration – docker-compose files – ansible scripts – getting started entrypoint  elek/docker-bigdata-base (base image for all the containers) – Contains all the configuration loading (and some documentation) – Use CONFIG_TYPE environment variable to select configuration method • CONFIG_TYPE=simple (configuration from environment variables – for local env) • CONFIG_TYPE=consul (configuration from consul – for distributed environment)  elek/docker-…. (hadoop/spark/hive/...) – Docker images for the components
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Local demo  Local run, using host network – More configuration is needed – Auto scaling is supported – https://github.com/elek/bigdata-docker/tree/master/compose bridge network 172.13.0.1 172.13.0.5 172.13.0.2
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved  bin – hdfs – yarn – mapred  etc/hadoop – core-site.xml – mapred-site.xml – hdfs-site.xml  include  lib  libexec  sbin  share apache-hadoop-X.X.tar.gz 1. Packaging 2. Configuration server 3. Service registry 4. API gateway Components Do it with Hadoop
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry/configuration server  Service registry – Health check support – DNS support  Key-value store – Binary data is supported  Based on agents and servers  Easy to use REST API  RAFT based consensus protocol
  • 30. 30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Service registry/configuration server  Git2Consul – Mirror git repositories to consul  Consul template – Advanced Template engine – Renders a template (configuration file) based on the information from the consul – Run/restart a process on change  Registrator – Listen on docker event stream – Register new components to consul hdfs-namenode Consul Configuration (git) datanode datanode datanode hdfs-datanode consul-template git2consul Registrator docker event stream
  • 31. 31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Weave scope  Agents to monitor – network connections between components – cpu – memory  Supports Docker, Swarm, Weave network, …  Easy install  Transparent  Pluggable  Only problems: – Temporary docker containers
  • 32. 32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Distributed demo  Distributed run with host network – https://github.com/elek/bigdata-docker/tree/master/consul – Configuration is hosted in a consul instance – Dynamic update 10.8.0.5 10.8.0.5 10.8.0.5 10.8.0.5
  • 33. 33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved TODO  More profiles and configuration set – Ready to use kerberos/HA environments – On the fly keytab/keystore generation (?)  Scripting/tool improvement – Autorestart in case of service registration change  Configuration for more orcherstration/scheduling – Nomad? – Docker Swarm?  Easy image creation for specific builds  Improve docker images – Predefined volume/port definition – Consolidate default values
  • 34. 34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank You