SlideShare a Scribd company logo
How to Integrate Python into a
Scala Stack to Build
Realtime Predictive Models
Jerry Chou
Lead Research Engineer
jerry@fliptop.com
Stories Beforehand
• Product pivoted
• Data search => data analysis
• Build on top of existing infrastructure (hosted on AWS & Azure)
• Need tools for scientific computation
• Mahout (Java)
• Weka (Java)
• Scikit-learn (Python)
2
Agenda
• Requirements and high level concepts
• Tools for calling Python from Scala
• Decision making
3
High Level Concept - Before
4
Existing business logic
(in both Scala & Java)
Modeling Logic
(in Python)
Node 1
Modeling Logic
(in Python)
Node 2
… Modeling Logic
(in Python)
Node N
Requirements
• APIs to exploit Python’s modeling power
• Train, predict, model info query, etc
• Scalability
• On demand Python serving nodes
5
Tools for Scala-Python Integration
• Reimplementation of Python
• Jython (JPython)
• Communication through JNI
• Jepp
• Communication through IPC
• Thrift
• Communication through REST API calls
• Bottle
6
Jython (JPython)
• Re-Implementation of Python in Java
• Compiles to Java bytecode
• either on demand or statically.
• Can import and use any Java class
7
Jython
8
JVM
Scala Code
Python Code
Jython
Jython
• Lacks support for lots of extensions for
scientific computing
• Numpy, Scipy, etc.
• JyNI to the rescue?
• Not ready yet for even Numpy
9
10
糟透了 全部重做
Communication through JNI
•Jepp (Java Embedded Python)
• Embeds CPython in Java
• Runs Python code in CPython
• Leverages both JNI and Python/C API for integration
11
Python Interpreter
Jepp
12
JVM
Scala Code
Python Code
JNI Jepp
Jepp
13
object TestJepp extends App {
val jep = new Jep()
jep.runScript("python_util.py")
val a = (2).asInstanceOf[AnyRef]
val b = (3).asInstanceOf[AnyRef]
val sumByPython = jep.invoke("python_add", a, b)
}
object TestJepp extends App {
val jep = new Jep()
jep.runScript("python_util.py")
val a = (2).asInstanceOf[AnyRef]
val b = (3).asInstanceOf[AnyRef]
val sumByPython = jep.invoke("python_add", a, b)
}
def python_add(a, b):
return a + b
def python_add(a, b):
return a + b
python_util.py
TestJepp.scala
Communication through IPC
• Thrift
•Developed & open sourced by Facebook
•IDL-based (Interface Definition Language)
•Generates server/client code in specified languages
•Take care of protocol and transport layer details
•Comes with generators for Java, Python, C++, etc.
• No Scala generator
• Scrooge to the rescue!
14
Thrift – IDL
15
namespace java python_service_test
namespace py python_service_test
service PythonAddService
{
i32 pythonAdd (1:i32 a, 2:i32 b),
}
namespace java python_service_test
namespace py python_service_test
service PythonAddService
{
i32 pythonAdd (1:i32 a, 2:i32 b),
}
TestThrift.thrift
$ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
Thrift – Python Server
class ExampleHandler(python_service_test.PythonAddService.Iface):
def pythonAdd(self, a, b):
return a + b
handler = ExampleHandler()
processor = Example.Processor(handler)
transport = TSocket.TServerSocket(9090)
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
server.serve()
class ExampleHandler(python_service_test.PythonAddService.Iface):
def pythonAdd(self, a, b):
return a + b
handler = ExampleHandler()
processor = Example.Processor(handler)
transport = TSocket.TServerSocket(9090)
tfactory = TTransport.TBufferedTransportFactory()
pfactory = TBinaryProtocol.TBinaryProtocolFactory()
server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
server.serve()
PythonAddServer.py
class Iface:
def pythonAdd(self, a, b):
pass
class Iface:
def pythonAdd(self, a, b):
pass
PythonAddService.py
Thrift – Scala Client
17
object PythonAddClient extends App {
val transport: TTransport = new TSocket("localhost", 9090)
val protocol: TProtocol = new TBinaryProtocol(transport)
val client = new PythonAddService.Client(protocol)
transport.open()
val sumByPython = client.python_add(3, 5)
println("3 + 5 = " + sumByPython)
transport.close()
}
object PythonAddClient extends App {
val transport: TTransport = new TSocket("localhost", 9090)
val protocol: TProtocol = new TBinaryProtocol(transport)
val client = new PythonAddService.Client(protocol)
transport.open()
val sumByPython = client.python_add(3, 5)
println("3 + 5 = " + sumByPython)
transport.close()
}
PythonAddClient.scala
Thrift
18
JVM Scala Code
Thrift
Python Code
Python Interpreter
Thrift
Python Code
Python Interpreter
Thrift
…
Auto Balancing 、
Built-in Encryption
19
哦 ~ 還不錯
REST API Architecture
20
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM
Scala Code
Auto Balancer?
Encoding?
Thrift v.s. REST
Thrift RES
T
Load Balancer
✔
Encode / Decode
✔
Low Learning Curve
✔
No Dependency
✔
Does it matter?
No
(AWS & Azure)
No
(We’re already doing it)
Maybe
Yes
Fliptop’s Architecture
22
Load Balancer
…Bottle
Python Code
Bottle
Python Code
Bottle
Python Code
JVM Scala Code
5 Python servers
~4,500 requests/sec
Summary
• Jython
• (✓) Tight integration with Scala/Java
• (✗) Lack support for C extensions (JyNI might help in the future)
• Jepp
• (✓) Access high quality Python extensions with CPython speed
• (✗) Two runtime environments
• Thrift, REST
• (✓) Language-independent development
• (✗) Bigger communication overhead
23
Thank You
24
Other tools
• JyNI (Jython Native Interface)
• A compatibility layer to enable Jython to use native CPython
extensions like NumPy or SciPy
• Binary compatible with existing builds
• Cython
• A subset of Python implementation written in Python that
translates Python codes to C
• JNA (Java Native Access)
• JNI-based wrapper providing Java programs access to native
shared libraries
• JPE (Java-Python Extension)
• JNI-based wrapper integrating Java and standard Python
• last updated at: 2013-03-22
25

More Related Content

What's hot (20)

PDF
EKON 25 Python4Delphi_mX4
Max Kleiner
 
PDF
Command Line Arguments with Getopt::Long
Ian Kluft
 
PDF
Inroduction to golang
Yoni Davidson
 
PDF
Command line arguments that make you smile
Martin Melin
 
PDF
Android antipatterns
Bartosz Kosarzycki
 
PDF
Kotlin: Challenges in JVM language design
Andrey Breslav
 
PDF
PyWPS Development restart
Jachym Cepicky
 
PPT
The Kotlin Programming Language
intelliyole
 
PPTX
Iron Languages - NYC CodeCamp 2/19/2011
Jimmy Schementi
 
PPTX
Golang for OO Programmers
khalid Nowaf Almutiri
 
PDF
Kyrylo Cherneha "C++ & Python Interaction in Automotive Industry"
LogeekNightUkraine
 
PDF
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Shinya Takamaeda-Y
 
ODP
At Last an OCL Debugger
Edward Willink
 
PDF
What make Swift Awesome
Sokna Ly
 
PPTX
Introduction to Kotlin Language and its application to Android platform
EastBanc Tachnologies
 
PPT
jimmy hacking (at) Microsoft
Jimmy Schementi
 
PDF
Kotlin Overview
Silicon Straits
 
PDF
Swift and Kotlin Presentation
Andrzej Sitek
 
PDF
Vladimir Ulogov - Beyond the Loadable Module
Zabbix
 
PDF
Golang
Felipe Mamud
 
EKON 25 Python4Delphi_mX4
Max Kleiner
 
Command Line Arguments with Getopt::Long
Ian Kluft
 
Inroduction to golang
Yoni Davidson
 
Command line arguments that make you smile
Martin Melin
 
Android antipatterns
Bartosz Kosarzycki
 
Kotlin: Challenges in JVM language design
Andrey Breslav
 
PyWPS Development restart
Jachym Cepicky
 
The Kotlin Programming Language
intelliyole
 
Iron Languages - NYC CodeCamp 2/19/2011
Jimmy Schementi
 
Golang for OO Programmers
khalid Nowaf Almutiri
 
Kyrylo Cherneha "C++ & Python Interaction in Automotive Industry"
LogeekNightUkraine
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Shinya Takamaeda-Y
 
At Last an OCL Debugger
Edward Willink
 
What make Swift Awesome
Sokna Ly
 
Introduction to Kotlin Language and its application to Android platform
EastBanc Tachnologies
 
jimmy hacking (at) Microsoft
Jimmy Schementi
 
Kotlin Overview
Silicon Straits
 
Swift and Kotlin Presentation
Andrzej Sitek
 
Vladimir Ulogov - Beyond the Loadable Module
Zabbix
 
Golang
Felipe Mamud
 

Viewers also liked (13)

PPT
Mixing Python and Java
Andreas Schreiber
 
PDF
Jython: Integrating Python and Java
Charles Anderson
 
PPTX
Apache Thrift Outline
Akihiro Katou
 
PDF
SOA with Thrift and Finagle
Luka Zakrajšek
 
PDF
Python y Flink
Paradigma Digital
 
PPT
Communication between Java and Python
Andreas Schreiber
 
PDF
Data Engineering with Solr and Spark
Lucidworks
 
PPTX
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
PPTX
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
PPT
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
PDF
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
fwierzbicki
 
PPTX
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 
Mixing Python and Java
Andreas Schreiber
 
Jython: Integrating Python and Java
Charles Anderson
 
Apache Thrift Outline
Akihiro Katou
 
SOA with Thrift and Finagle
Luka Zakrajšek
 
Python y Flink
Paradigma Digital
 
Communication between Java and Python
Andreas Schreiber
 
Data Engineering with Solr and Spark
Lucidworks
 
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Slim Baltagi
 
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Jython 2.7 and techniques for integrating with Java - Frank Wierzbicki
fwierzbicki
 
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 
Ad

Similar to [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models (20)

ZIP
Javascript Everywhere
Pascal Rettig
 
PDF
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
PDF
Java-Jersey 到 Python-Flask 服務不中斷重構之旅
Max Lai
 
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
PDF
Pemrograman Python untuk Pemula
Oon Arfiandwi
 
PDF
From Java to Kotlin - The first month in practice
StefanTomm
 
PPTX
Introduction to Python.Net
Stefan Schukat
 
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
PPTX
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
PDF
Cluj.py Meetup: Extending Python in C
Steffen Wenz
 
PPT
Performance and Scalability Testing with Python and Multi-Mechanize
coreygoldberg
 
PDF
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
PPTX
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim
 
PPT
Python with dot net and vs2010
Wei Sun
 
PDF
Tips and tricks for data science projects with Python
Jose Manuel Ortega Candel
 
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
PPTX
2012: ql.io and Node.js
Jonathan LeBlanc
 
PDF
Writing a Python C extension
Sqreen
 
PDF
Beyond the Loadable Module
Vladimir Ulogov
 
KEY
Python在豆瓣的应用
Qiangning Hong
 
Javascript Everywhere
Pascal Rettig
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Edureka!
 
Java-Jersey 到 Python-Flask 服務不中斷重構之旅
Max Lai
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Pemrograman Python untuk Pemula
Oon Arfiandwi
 
From Java to Kotlin - The first month in practice
StefanTomm
 
Introduction to Python.Net
Stefan Schukat
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
Google cloud Dataflow & Apache Flink
Iván Fernández Perea
 
Cluj.py Meetup: Extending Python in C
Steffen Wenz
 
Performance and Scalability Testing with Python and Multi-Mechanize
coreygoldberg
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Thomas Weise
 
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim
 
Python with dot net and vs2010
Wei Sun
 
Tips and tricks for data science projects with Python
Jose Manuel Ortega Candel
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
2012: ql.io and Node.js
Jonathan LeBlanc
 
Writing a Python C extension
Sqreen
 
Beyond the Loadable Module
Vladimir Ulogov
 
Python在豆瓣的应用
Qiangning Hong
 
Ad

Recently uploaded (20)

PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 

[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

  • 1. How to Integrate Python into a Scala Stack to Build Realtime Predictive Models Jerry Chou Lead Research Engineer [email protected]
  • 2. Stories Beforehand • Product pivoted • Data search => data analysis • Build on top of existing infrastructure (hosted on AWS & Azure) • Need tools for scientific computation • Mahout (Java) • Weka (Java) • Scikit-learn (Python) 2
  • 3. Agenda • Requirements and high level concepts • Tools for calling Python from Scala • Decision making 3
  • 4. High Level Concept - Before 4 Existing business logic (in both Scala & Java) Modeling Logic (in Python) Node 1 Modeling Logic (in Python) Node 2 … Modeling Logic (in Python) Node N
  • 5. Requirements • APIs to exploit Python’s modeling power • Train, predict, model info query, etc • Scalability • On demand Python serving nodes 5
  • 6. Tools for Scala-Python Integration • Reimplementation of Python • Jython (JPython) • Communication through JNI • Jepp • Communication through IPC • Thrift • Communication through REST API calls • Bottle 6
  • 7. Jython (JPython) • Re-Implementation of Python in Java • Compiles to Java bytecode • either on demand or statically. • Can import and use any Java class 7
  • 9. Jython • Lacks support for lots of extensions for scientific computing • Numpy, Scipy, etc. • JyNI to the rescue? • Not ready yet for even Numpy 9
  • 11. Communication through JNI •Jepp (Java Embedded Python) • Embeds CPython in Java • Runs Python code in CPython • Leverages both JNI and Python/C API for integration 11
  • 13. Jepp 13 object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b) } def python_add(a, b): return a + b def python_add(a, b): return a + b python_util.py TestJepp.scala
  • 14. Communication through IPC • Thrift •Developed & open sourced by Facebook •IDL-based (Interface Definition Language) •Generates server/client code in specified languages •Take care of protocol and transport layer details •Comes with generators for Java, Python, C++, etc. • No Scala generator • Scrooge to the rescue! 14
  • 15. Thrift – IDL 15 namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } namespace java python_service_test namespace py python_service_test service PythonAddService { i32 pythonAdd (1:i32 a, 2:i32 b), } TestThrift.thrift $ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift
  • 16. Thrift – Python Server class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b handler = ExampleHandler() processor = Example.Processor(handler) transport = TSocket.TServerSocket(9090) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve() PythonAddServer.py class Iface: def pythonAdd(self, a, b): pass class Iface: def pythonAdd(self, a, b): pass PythonAddService.py
  • 17. Thrift – Scala Client 17 object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol) transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close() } PythonAddClient.scala
  • 18. Thrift 18 JVM Scala Code Thrift Python Code Python Interpreter Thrift Python Code Python Interpreter Thrift … Auto Balancing 、 Built-in Encryption
  • 20. REST API Architecture 20 …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code Auto Balancer? Encoding?
  • 21. Thrift v.s. REST Thrift RES T Load Balancer ✔ Encode / Decode ✔ Low Learning Curve ✔ No Dependency ✔ Does it matter? No (AWS & Azure) No (We’re already doing it) Maybe Yes
  • 22. Fliptop’s Architecture 22 Load Balancer …Bottle Python Code Bottle Python Code Bottle Python Code JVM Scala Code 5 Python servers ~4,500 requests/sec
  • 23. Summary • Jython • (✓) Tight integration with Scala/Java • (✗) Lack support for C extensions (JyNI might help in the future) • Jepp • (✓) Access high quality Python extensions with CPython speed • (✗) Two runtime environments • Thrift, REST • (✓) Language-independent development • (✗) Bigger communication overhead 23
  • 25. Other tools • JyNI (Jython Native Interface) • A compatibility layer to enable Jython to use native CPython extensions like NumPy or SciPy • Binary compatible with existing builds • Cython • A subset of Python implementation written in Python that translates Python codes to C • JNA (Java Native Access) • JNI-based wrapper providing Java programs access to native shared libraries • JPE (Java-Python Extension) • JNI-based wrapper integrating Java and standard Python • last updated at: 2013-03-22 25