SlideShare a Scribd company logo
Soft-Shake 15 - Geneva
@romeokienzler
kauffmann@ch.ibm.com
Scala, Apache Spark, The
PlayFramework, Docker and
Platform as a Service
The Ingredients
 NodeJS
 NodeRED
 Scala
 The Play Framework
 Apache Spark
 Docker, DockerCompose, DockerSwarm
 Platform as a Service powered by IBM Bluemix
2
NodeJS
 Server Side JavaScript Runtime Framework
 OpenSource
 Very frequently used by Startups
 REACTIVE (see explanation on PlayFramework slide)
3
NodeRED
 OpenSource Data Integration Framework
 Supports Visual Programming
 Very large set of connectors and extensions (> 400)
 Created by IBM
 Runs on top of NodeJS
 Extensible through JavaScript
4
Scala
 Invented @EPFL
 Runs on top of JVM
 Open but commercialized through Typsafe
 Strong on functional programming paradigm (nice for data analytics tasks)
 Supports OOP as well
5
The PlayFramework
 Written in Scala
 Compatible with Scala and Java
 Meant to build REACTIVE HTTP services by unbinding the requests from the
threads through callback handlers
 Used at LinkedIn for example and at a major company in Valais
6
Apache Spark
 Successor of MapReduce
 Supports various data stores, e.g. HDFS, Swift, S3, ...
 Forces you to use functional programming
 Therefore creates highly parallelizable code
 Programmable in Java, Scala and Python
 Central Data Structure are RDDs (Resilient Distributed Datasets) virtualizing the
underlying storage architecture
7
Docker
 Behavior similar to virtual machines
 Based on cgroups and namespaces Linux kernel extension
 Uses LXC internally
 In contrast to virtual machines the runtime instances are called container
 Operating system processes are running on the host system but within a
container they apear to be alone
 A docker container starts in < 100 ms and you can run 100rds of them on a
single host system
8
DockerCompose
 A way to define and run a multi container topology
 Topology defined in a single docker-compose.yml file
 Individual containers serving different tiers can be scaled up/down
9
DockerSwarm
 What if a single machine is to weak to run your topology?
 Groups multiple nodes together to act as a single docker node
 Uses same API than DOCKER on a standalone machine
 In combination with DockerCompose you get a lightweight and ultra fast
scaling runtime
10
Platform as a Service through IBM Bluemix
 Powerd by CloudFoundry (OpenSource/OpenStandard)
 Supports Docker, runs on DockerSwarm (with a container placement optimizer)
 DockerCompose support by end of year
 Supports virtual machines via OpenStack
 > 100 services (e.g. Hadoop, Spark, SWIFT, MongoDB, MySQL, Watson, ...)
 Core runtime for this talk
11
Usecase
 Get tweets for the public twitter API (not firehose)
 Using NodeRED add sentiment analysis through an IBM Watson Service
 Store tweets plus sentiment score in OpenStack Swift Service on Bluemix
 Additionally store them in the HDFS Service on Bluemix
 Using Apache Spark and Scala apply retrospective analysis
 Using BigSQL, JQuery and the PlayFramework draw a realtime chart
12
Architecture – Get the tweets
NodeRED
OpenStack
SWIFT
HADOOP
HDFS
13
Architecture – down stream analysis
OpenStack
SWIFT
HADOOP
HDFS
Spark
Service
BigSQL
iPyhton
Notebook
supporting
Scala
CloudFoundry
Container with
PlayFramework
running on
JVM
REST Service
Web Browser
running AJAX
application
using JQuery
14
NodeRED Tweet ingestion & sentiment scoring
PlayFramework REST Service
def data = Action.async {
var statement = connection.createStatement
val resultSet = statement.executeQuery("select count(*) as
total, (select count(*) as IBM from tweetsift where UCASE(tweet)
like '%IBM%'), (select count(*) as softlayer from tweetsift where
UCASE(tweet) like '%SOFTLAYER%') from tweetsift")
resultSet.next() // we expect exactly one row
val total = resultSet.getInt("TOTAL")
val ibm = resultSet.getInt("IBM")
val softlayer = resultSet.getInt("SOFTLAYER")
val result = "["+total+","+ibm+","+softlayer+"]"
Ok(result)
}
Preprocessed data using R service in Bluemix
17
JQuery AJAX WebApplication calling REST
Service
View on the SWIFT explorer
Apache Spark Access to the data in IBM
Bluemix
var tweets = sc.textFile("swift://softshake.spark/tmp_25573-tweets1126007960.csv");
var companies = sc.textFile("swift://softshake.spark/tmp_25573-companies-384438100.csv");
val tweetsHeaderAndRows = tweets.map(line => line.split(",").map(_.trim))
val tweetsHeader = tweetsHeaderAndRows.first
val tweetsData = tweetsHeaderAndRows.filter(_(0) != tweetsHeader(0))
val tweetMaps = tweetsData.map(splits => tweetsHeader.zip(splits).toMap)
val companiesData = companies.filter(s => !s.equals("COMPANY_NAME_ID"));
Calculating tweet frequency per company
val tweetsWithCompany = tweetMaps.cartesian(companiesData).filter(t =>
t._1("TEXT").toLowerCase().contains(t._2.toLowerCase))
val companyAndScore = tweetsWithCompany.map(t => (t._2,t._1("SCORE").toDouble))
val companyFrequency = companyAndScore.map(t => (t._2,1)).reduceByKey(_ + _)
Wanna do it yourself?
 IBM Cloud Free Tier (incl. Bluemix): http://ibm.biz/joinIBMCloud
 24-120K CHF Cloud credits for startups  romeo.kienzler@ch.ibm.com
 *A*N*Y question  romeo.kienzler@ch.ibm.com
 Free usage for Students and Faculties  romeo.Kienzler@ch.ibm.com
Wanna hear more?
Nov 2nd. in Zurich: Apache Spark Advanced Meetup
http://www.meetup.com/HackSessionsSwitzerland/events/225445919/?oc=evam
Nov 3rd. in Berne: - cloud computing - Apache spark - challenges in NG sequencing
http://www.meetup.com/SwissLifeScience/events/225836187/?oc=evam
Nov 11th. in Lausanne: Introduction to Docker, Streamcomputing on ApacheSpark
and InfoSphere Streams
http://www.meetup.com/HackSessionsSwitzerland/events/225441845/?oc=evam
Some sessions will be streamed at: http://www.meetup.com/Cloud-Scale-Data-Science-virtual-UserGroup-
worldwide/

More Related Content

What's hot (20)

PDF
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
In-Memory Computing Summit
 
PPTX
Building Next Generation Clouds With OpenStack
Kenneth Hui
 
PPTX
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Motoki Kakinuma
 
PPT
Cloud Standards and CloudStack
Sebastien Goasguen
 
PPTX
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Sriram Subramanian
 
PPTX
Cloudstack container service
ShapeBlue
 
PPTX
OpenStack and Rackspace
Everett Toews
 
PDF
[OpenStack Day in Korea 2015] Track 2-2 - OpenStack for PaaS: Why it's Hot
OpenStack Korea Community
 
PPTX
The Elephant in the Cloud: Bring True Cloud Economics to Hadoop/BigInsights
Nati Shalom
 
PDF
How we Upgraded Public Cloud From Juno to Queens with Minimal Downtime? | Ngu...
Vietnam Open Infrastructure User Group
 
PDF
Lessons Learned Running The Largest OpenStack Clouds
Kenneth Hui
 
PDF
Introducing Cloud Development with Project Shipped and Mantl: a deep dive
Cisco DevNet
 
PDF
Introduction to OpenStack
Edureka!
 
PPTX
DockerCon17 Recap
Kaslin Fields
 
PDF
From MapReduce to Apache Spark
Jen Aman
 
PPTX
CloudStack news
ShapeBlue
 
PPTX
Netflix Cloud Architecture and Open Source
aspyker
 
PDF
Webinar - Relying on Bare Metal to manage your workloads
Scaleway
 
PDF
Docker Seattle Meetup, May 2017
Stephen Walli
 
PDF
Fundamental Paradigms for Java Developers: NoSQL and OSGI
Otávio Santana
 
IMCSummit 2015 - Day 1 Developer Track - Open-Source In-Memory Platforms: Ben...
In-Memory Computing Summit
 
Building Next Generation Clouds With OpenStack
Kenneth Hui
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Motoki Kakinuma
 
Cloud Standards and CloudStack
Sebastien Goasguen
 
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Sriram Subramanian
 
Cloudstack container service
ShapeBlue
 
OpenStack and Rackspace
Everett Toews
 
[OpenStack Day in Korea 2015] Track 2-2 - OpenStack for PaaS: Why it's Hot
OpenStack Korea Community
 
The Elephant in the Cloud: Bring True Cloud Economics to Hadoop/BigInsights
Nati Shalom
 
How we Upgraded Public Cloud From Juno to Queens with Minimal Downtime? | Ngu...
Vietnam Open Infrastructure User Group
 
Lessons Learned Running The Largest OpenStack Clouds
Kenneth Hui
 
Introducing Cloud Development with Project Shipped and Mantl: a deep dive
Cisco DevNet
 
Introduction to OpenStack
Edureka!
 
DockerCon17 Recap
Kaslin Fields
 
From MapReduce to Apache Spark
Jen Aman
 
CloudStack news
ShapeBlue
 
Netflix Cloud Architecture and Open Source
aspyker
 
Webinar - Relying on Bare Metal to manage your workloads
Scaleway
 
Docker Seattle Meetup, May 2017
Stephen Walli
 
Fundamental Paradigms for Java Developers: NoSQL and OSGI
Otávio Santana
 

Viewers also liked (20)

PDF
Overview of DataStax OpsCenter
DataStax
 
PDF
What is play
Takafumi Ikeda
 
PPT
Intoduction on Playframework
Knoldus Inc.
 
PDF
BIG DATA サービス と ツール
Ngoc Dao
 
PDF
Apache Cassandra overview
ElifTech
 
PDF
Using docker for data science - part 2
Calvin Giles
 
PDF
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Roberto Hashioka
 
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
PDF
Growing the Mesos Ecosystem
Mesosphere Inc.
 
PDF
Time Series Processing with Solr and Spark
Josef Adersberger
 
PPTX
High Performance Processing of Streaming Data
Geoffrey Fox
 
PPTX
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
PDF
Data analysis with Pandas and Spark
Felix Crisan
 
PDF
The basics of fluentd
Treasure Data, Inc.
 
PDF
Fluentd and Kafka
N Masahiro
 
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PPTX
Hadoop on Docker
Rakesh Saha
 
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Data Con LA
 
PPTX
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
PDF
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
Overview of DataStax OpsCenter
DataStax
 
What is play
Takafumi Ikeda
 
Intoduction on Playframework
Knoldus Inc.
 
BIG DATA サービス と ツール
Ngoc Dao
 
Apache Cassandra overview
ElifTech
 
Using docker for data science - part 2
Calvin Giles
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Roberto Hashioka
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Growing the Mesos Ecosystem
Mesosphere Inc.
 
Time Series Processing with Solr and Spark
Josef Adersberger
 
High Performance Processing of Streaming Data
Geoffrey Fox
 
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Data Con LA
 
Data analysis with Pandas and Spark
Felix Crisan
 
The basics of fluentd
Treasure Data, Inc.
 
Fluentd and Kafka
N Masahiro
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Hadoop on Docker
Rakesh Saha
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Data Con LA
 
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
Ad

Similar to Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service (20)

PPTX
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
gmalouf678
 
PPTX
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
PDF
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
PPTX
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
PDF
Streaming Sensor Data Slides_Virender
vithakur
 
PDF
Big Data , Big Problem?
Mohammadhasan Farazmand
 
PDF
Quickly build and deploy a scalable OpenStack Swift application using IBM Blu...
Daniel Krook
 
PDF
Reactive Microservices with Spring 5: WebFlux
Trayan Iliev
 
PDF
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
PPTX
Stream Computing (The Engineer's Perspective)
Ilya Ganelin
 
PPT
IBM Bluemix and Docker Guest Lecture at Cork Institute of Technology
Sanjay Nayak
 
PDF
Event Driven Streaming Analytics - Demostration on Architecture of IoT
Lei Xu
 
PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
PDF
Mesos at OpenTable
Pablo Delgado
 
PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
PPTX
An adaptive and eventually self healing framework for geo-distributed real-ti...
Angad Singh
 
PPTX
Hadoop Big Data A big picture
J S Jodha
 
PDF
Spark after Dark by Chris Fregly of Databricks
Data Con LA
 
PDF
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Chris Fregly
 
PPTX
Technical Overview of Apache Drill by Jacques Nadeau
MapR Technologies
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
gmalouf678
 
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
Streaming Sensor Data Slides_Virender
vithakur
 
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Quickly build and deploy a scalable OpenStack Swift application using IBM Blu...
Daniel Krook
 
Reactive Microservices with Spring 5: WebFlux
Trayan Iliev
 
How Apache Spark fits in the Big Data landscape
Paco Nathan
 
Stream Computing (The Engineer's Perspective)
Ilya Ganelin
 
IBM Bluemix and Docker Guest Lecture at Cork Institute of Technology
Sanjay Nayak
 
Event Driven Streaming Analytics - Demostration on Architecture of IoT
Lei Xu
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
Mesos at OpenTable
Pablo Delgado
 
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
An adaptive and eventually self healing framework for geo-distributed real-ti...
Angad Singh
 
Hadoop Big Data A big picture
J S Jodha
 
Spark after Dark by Chris Fregly of Databricks
Data Con LA
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Chris Fregly
 
Technical Overview of Apache Drill by Jacques Nadeau
MapR Technologies
 
Ad

More from Romeo Kienzler (20)

PDF
Parallelization Stategies of DeepLearning Neural Network Training
Romeo Kienzler
 
PDF
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Romeo Kienzler
 
PDF
Love & Innovative technology presented by a technology pioneer and an AI expe...
Romeo Kienzler
 
PDF
Blockchain Technology Book Vernisage
Romeo Kienzler
 
PDF
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
 
PDF
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Romeo Kienzler
 
PDF
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
 
PDF
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Romeo Kienzler
 
PDF
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 
PDF
Geo Python16 keynote
Romeo Kienzler
 
PDF
Real-time DeepLearning on IoT Sensor Data
Romeo Kienzler
 
PDF
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
Romeo Kienzler
 
PDF
TDWI_DW2014_SQLNoSQL_DBAAS
Romeo Kienzler
 
PPT
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
 
ODP
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
ODP
DBaaS Bluemix Meetup DACH 26.8.14
Romeo Kienzler
 
PDF
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Romeo Kienzler
 
ODP
Cloud Databases, Developer Week Nuernberg 2014
Romeo Kienzler
 
ODP
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
 
PDF
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
 
Parallelization Stategies of DeepLearning Neural Network Training
Romeo Kienzler
 
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Romeo Kienzler
 
Love & Innovative technology presented by a technology pioneer and an AI expe...
Romeo Kienzler
 
Blockchain Technology Book Vernisage
Romeo Kienzler
 
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
 
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Romeo Kienzler
 
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
 
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Romeo Kienzler
 
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
 
Geo Python16 keynote
Romeo Kienzler
 
Real-time DeepLearning on IoT Sensor Data
Romeo Kienzler
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
Romeo Kienzler
 
TDWI_DW2014_SQLNoSQL_DBAAS
Romeo Kienzler
 
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
DBaaS Bluemix Meetup DACH 26.8.14
Romeo Kienzler
 
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Romeo Kienzler
 
Cloud Databases, Developer Week Nuernberg 2014
Romeo Kienzler
 
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
 
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
 

Recently uploaded (20)

PDF
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
PPTX
Orchestrating things in Angular application
Peter Abraham
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PDF
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
Cerebraix Technologies
 
PPTX
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
PDF
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
PPT
introduction to networking with basics coverage
RamananMuthukrishnan
 
PPTX
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
PPTX
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
PPTX
internet básico presentacion es una red global
70965857
 
PDF
Paper: Quantum Financial System - DeFi patent wars
Steven McGee
 
PPTX
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
PDF
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
DOCX
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PDF
The Internet - By the numbers, presented at npNOG 11
APNIC
 
PPTX
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
PPT
introductio to computers by arthur janry
RamananMuthukrishnan
 
AI_MOD_1.pdf artificial intelligence notes
shreyarrce
 
Orchestrating things in Angular application
Peter Abraham
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
Cerebraix Technologies
 
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
introduction to networking with basics coverage
RamananMuthukrishnan
 
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
internet básico presentacion es una red global
70965857
 
Paper: Quantum Financial System - DeFi patent wars
Steven McGee
 
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
The Internet - By the numbers, presented at npNOG 11
APNIC
 
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
introductio to computers by arthur janry
RamananMuthukrishnan
 

Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service

  • 1. Soft-Shake 15 - Geneva @romeokienzler [email protected] Scala, Apache Spark, The PlayFramework, Docker and Platform as a Service
  • 2. The Ingredients  NodeJS  NodeRED  Scala  The Play Framework  Apache Spark  Docker, DockerCompose, DockerSwarm  Platform as a Service powered by IBM Bluemix 2
  • 3. NodeJS  Server Side JavaScript Runtime Framework  OpenSource  Very frequently used by Startups  REACTIVE (see explanation on PlayFramework slide) 3
  • 4. NodeRED  OpenSource Data Integration Framework  Supports Visual Programming  Very large set of connectors and extensions (> 400)  Created by IBM  Runs on top of NodeJS  Extensible through JavaScript 4
  • 5. Scala  Invented @EPFL  Runs on top of JVM  Open but commercialized through Typsafe  Strong on functional programming paradigm (nice for data analytics tasks)  Supports OOP as well 5
  • 6. The PlayFramework  Written in Scala  Compatible with Scala and Java  Meant to build REACTIVE HTTP services by unbinding the requests from the threads through callback handlers  Used at LinkedIn for example and at a major company in Valais 6
  • 7. Apache Spark  Successor of MapReduce  Supports various data stores, e.g. HDFS, Swift, S3, ...  Forces you to use functional programming  Therefore creates highly parallelizable code  Programmable in Java, Scala and Python  Central Data Structure are RDDs (Resilient Distributed Datasets) virtualizing the underlying storage architecture 7
  • 8. Docker  Behavior similar to virtual machines  Based on cgroups and namespaces Linux kernel extension  Uses LXC internally  In contrast to virtual machines the runtime instances are called container  Operating system processes are running on the host system but within a container they apear to be alone  A docker container starts in < 100 ms and you can run 100rds of them on a single host system 8
  • 9. DockerCompose  A way to define and run a multi container topology  Topology defined in a single docker-compose.yml file  Individual containers serving different tiers can be scaled up/down 9
  • 10. DockerSwarm  What if a single machine is to weak to run your topology?  Groups multiple nodes together to act as a single docker node  Uses same API than DOCKER on a standalone machine  In combination with DockerCompose you get a lightweight and ultra fast scaling runtime 10
  • 11. Platform as a Service through IBM Bluemix  Powerd by CloudFoundry (OpenSource/OpenStandard)  Supports Docker, runs on DockerSwarm (with a container placement optimizer)  DockerCompose support by end of year  Supports virtual machines via OpenStack  > 100 services (e.g. Hadoop, Spark, SWIFT, MongoDB, MySQL, Watson, ...)  Core runtime for this talk 11
  • 12. Usecase  Get tweets for the public twitter API (not firehose)  Using NodeRED add sentiment analysis through an IBM Watson Service  Store tweets plus sentiment score in OpenStack Swift Service on Bluemix  Additionally store them in the HDFS Service on Bluemix  Using Apache Spark and Scala apply retrospective analysis  Using BigSQL, JQuery and the PlayFramework draw a realtime chart 12
  • 13. Architecture – Get the tweets NodeRED OpenStack SWIFT HADOOP HDFS 13
  • 14. Architecture – down stream analysis OpenStack SWIFT HADOOP HDFS Spark Service BigSQL iPyhton Notebook supporting Scala CloudFoundry Container with PlayFramework running on JVM REST Service Web Browser running AJAX application using JQuery 14
  • 15. NodeRED Tweet ingestion & sentiment scoring
  • 16. PlayFramework REST Service def data = Action.async { var statement = connection.createStatement val resultSet = statement.executeQuery("select count(*) as total, (select count(*) as IBM from tweetsift where UCASE(tweet) like '%IBM%'), (select count(*) as softlayer from tweetsift where UCASE(tweet) like '%SOFTLAYER%') from tweetsift") resultSet.next() // we expect exactly one row val total = resultSet.getInt("TOTAL") val ibm = resultSet.getInt("IBM") val softlayer = resultSet.getInt("SOFTLAYER") val result = "["+total+","+ibm+","+softlayer+"]" Ok(result) }
  • 17. Preprocessed data using R service in Bluemix 17
  • 18. JQuery AJAX WebApplication calling REST Service
  • 19. View on the SWIFT explorer
  • 20. Apache Spark Access to the data in IBM Bluemix var tweets = sc.textFile("swift://softshake.spark/tmp_25573-tweets1126007960.csv"); var companies = sc.textFile("swift://softshake.spark/tmp_25573-companies-384438100.csv"); val tweetsHeaderAndRows = tweets.map(line => line.split(",").map(_.trim)) val tweetsHeader = tweetsHeaderAndRows.first val tweetsData = tweetsHeaderAndRows.filter(_(0) != tweetsHeader(0)) val tweetMaps = tweetsData.map(splits => tweetsHeader.zip(splits).toMap) val companiesData = companies.filter(s => !s.equals("COMPANY_NAME_ID"));
  • 21. Calculating tweet frequency per company val tweetsWithCompany = tweetMaps.cartesian(companiesData).filter(t => t._1("TEXT").toLowerCase().contains(t._2.toLowerCase)) val companyAndScore = tweetsWithCompany.map(t => (t._2,t._1("SCORE").toDouble)) val companyFrequency = companyAndScore.map(t => (t._2,1)).reduceByKey(_ + _)
  • 22. Wanna do it yourself?  IBM Cloud Free Tier (incl. Bluemix): http://ibm.biz/joinIBMCloud  24-120K CHF Cloud credits for startups  [email protected]  *A*N*Y question  [email protected]  Free usage for Students and Faculties  [email protected]
  • 23. Wanna hear more? Nov 2nd. in Zurich: Apache Spark Advanced Meetup http://www.meetup.com/HackSessionsSwitzerland/events/225445919/?oc=evam Nov 3rd. in Berne: - cloud computing - Apache spark - challenges in NG sequencing http://www.meetup.com/SwissLifeScience/events/225836187/?oc=evam Nov 11th. in Lausanne: Introduction to Docker, Streamcomputing on ApacheSpark and InfoSphere Streams http://www.meetup.com/HackSessionsSwitzerland/events/225441845/?oc=evam Some sessions will be streamed at: http://www.meetup.com/Cloud-Scale-Data-Science-virtual-UserGroup- worldwide/