SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Deep Learning on HDP
Prague 2018
Timothy Spann, Solutions Engineer
Hortonworks @PaaSDev
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Disclaimer
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Agenda
• Data Engineering With Deep Learning
• TensorFlow with Apache NiFi
• TensorFlow on YARN
• Apache MXNet Pre-Built Models
• Apache MXNet Model Server With Apache NiFi
• Apache MXNet in Apache Zeppelin Notebooks
• Apache MXNet On YARN
• Demos
• Questions
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning for Big Data Engineers
Multiple users, frameworks, languages, data sources & clusters
BIG DATA ENGINEER
• Experience in ETL
• Coding skills in Scala,
Python, Java
• Experience with Apache
Hadoop
• Knowledge of database
query languages such as
SQL
• Knowledge of Hadoop tools
such as Hive, or Pig
• Expert in ETL (Eating, Ties
and Laziness)
• Social Media Maven
• Deep SME in Buzzwords
• No Coding Skills
• Interest in Pig and Falcon
CAT AI
• Will Drive your Car
• Will Fix Your Code
• Will Beat You At Q-Bert
• Will Not Be Discussed
Today
• Will Not Finish This Talk For
Me, This Time
http://gluon.mxnet.io/chapter01_crashcourse/preface.html
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Use Cases
So Why Am I Orchestrating These Complex Deep Learning Workflows?
Computer Vision
• Object Recognition
• Image Classification
• Object Detection
• Motion Estimation
• Annotation
• Visual Question and Answer
• Autonomous Driving
• Speech to Text
• Speech Recognition
• Chat Bot
• Voice UI
Speech Recognition Natural Language Processing
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
https://github.com/zackchase/mxnet-the-straight-dope
Recommender Systems
• Content-based
Recommendations
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Options
• TensorFlow (C++, Python, Java)
• TensorFlow on Spark (Yahoo)
• Caffe on Spark (Yahoo)
• Apache MXNet (Baidu, Amazon, Nvidia, MS, CMU, NYU, intel)
• Deep Learning 4 J (Skymind) JVM
• PyTorch
• H2o Deep Water
• Keras ontop of TensorFlow and DL4J
• Apache Singa
• Caffe2 (Facebook)
7 © Hortonworks Inc. 2011–2018. All rights reserved.
Recommendations
• Install CPU Version on CPU YARN Nodes
• Install GPU Version on Nvidia (CUDA)
• Do training on GPU YARN Nodes where possible
• Apply Model on All Nodes and Trigger with Apache NiFi
• What helps Hadoop and Spark will help TensorFlow. More RAM, More and
Faster Cores, More Nodes.
• Today, Run either pure TensorFlow with Keras <or> TensorFlow on Spark.
• Try YARN 3.0 Containerized TensorFlow later in the year.
• Consider Alluxio or Apache Ignite for in-memory optimization
• Download the model zoos
• Evaluate other Deep Learning Frameworks like MXNet and PyTorch
8 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Options
9 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow on Hadoop
https://www.tensorflow.org/deploy/hadoop
HDFS files can be used as a distributed source for input producers for training, allowing one fast cluster to
Store these massive datasets and share them amongst your cluster.
This requires setting a few environment variables:
JAVA_HOME
HADOOP_HDFS_HOME
LD_LIBRARY_PATH
CLASSPATH
10 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Serving on YARN 3.0 https://github.com/NVIDIA/nvidia-docker
We use NVIDIA Docker
containers on top of YARN
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Run TensorFlow on YARN 3.0
https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Deep Learning Flow
Ingestion
Simple Event Processing
Engine
Stream Processing
Destination
Data Bus
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
13 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks Inc. 2011
Streaming Apache Deep Learning
Page 13
Data Acquisition
Edge Processing
Deep Learning
Real Time Stream Analytics
Rapid Application Development
IoT
ANALYTICS
CLOUD
Acquire Move
Routing
&
Filtering
Deliver Parse Analysis
Aggregation
Modeling
14 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Deep Learning Components
Streaming Analytics
Manager
Machine Learning
Distributed queue
Buffering
Process decoupling
Streaming and SQL
Orchestration
Queueing
Simple Event Processing
REST API
Secure Spark Execution
15 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics
Manager
Run everywhere
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Entity Resolution
Natural Language Processing
Apache Deep Learning Components
16 © Hortonworks Inc. 2011–2018. All rights reserved.
http://mxnet.incubator.apache.org/
• Cloud ready
• Experienced team (XGBoost)
• AWS, Microsoft, NVIDIA, Baidu, Intel backing
• Apache Incubator Project
• Run distributed on YARN
• In my early tests, faster than TensorFlow.
• Runs on Raspberry PI, Nvidia Jetson TX1
and other constrained devices
• Great documentation
• Gluon
• Great Python Interaction
• Model Server Available
• ONNX Support
• Now in Version 1.1!
• Great Model Zoo
https://mxnet.incubator.apache.org/how_to/cloud.html
https://github.com/apache/incubator-mxnet/tree/1.1.0/example
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Architecture
HDP Node X
Node
Manager
Datanode
HBase
Region
HDP Node Y
Node
Manager
Datanode
HBase
Region
HDF Node
Apache NiFi
Zookeeper
Apache Spark
MLib
Apache Spark
MLib
GPU Node
Neural Network
Apache Spark
MLib
Apache Spark
MLib
Pipeline
GPU Node
Neural Network
Pipeline
MiNiFi Java
Agent
MiNiFi C++
Agent
HDF Node
Apache NiFi
Zookeeper
Apache Livy
18 © Hortonworks Inc. 2011–2018. All rights reserved.
What do we want to do?
• MiniFi ingests camera images and
sensor data
• MiniFi executes Apache MXNet at the
edge
• Run Apache MXNet Inception to
recognize objects in image
• Apache NiFi stores images, metadata
and enriched data in Hadoop
• Apache NiFi ingests social data and
REST feeds
• Apache OpenNLP and Apache Tika for
textual data
19 © Hortonworks Inc. 2011–2018. All rights reserved.
Aggregate all data from sensors, drones, logs, geo-location devices,
machines and social feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to Apache HBase, Apache Hive, HDFS, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather,
location, sentiment analysis, image analysis, object detection, image
recognition, voice recognition with Apache Tika, Apache OpenNLP and
Apache MXNet.
Curate: Gain Insights
20 © Hortonworks Inc. 2011–2018. All rights reserved.
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a fifty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
21 © Hortonworks Inc. 2011–2018. All rights reserved.
• Apache MXNet via Execute Process (Python)
• Apache MXNet Running on Edge Nodes (MiniFi) S2S
• Apache MXNet Model Server Integration (REST API)
Not Covered Today
• *Dockerized Apache MXNet on Hadoop YARN 3 with NVidia GPU
• *Apache MXNet on Spark
Apache NiFi Integration with Apache MXNet Options
22 © Hortonworks Inc. 2011–2018. All rights reserved.
• https://github.com/apache/incubator-mxnet/tree/master/tools/coreml
• https://github.com/Leliana/WhatsThis
• https://github.com/apache/incubator-mxnet/tree/master/amalgamation/jni
• https://hub.docker.com/r/mxnet/
• https://github.com/apache/incubator-mxnet/tree/master/scala-
package/spark
Other Options
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Pre-Built Models
• CaffeNet
• SqueezeNet v1.1
• Inception v3
• Single Shot Detection (SSD)
• VGG19
• ResidualNet 152
• LSTM
http://mxnet.incubator.apache.org/model_zoo/index.html
https://github.com/dmlc/mxnet-model-gallery
24 © Hortonworks Inc. 2011–2018. All rights reserved.
python3 -W ignore analyze.py
{"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1":
"n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2":
"n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3":
"n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt",
"top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename":
"images/tx1_image_img_20180208204131.jpg", "runtime": "2"}
Apache MXNet via Python (OSX Local with WebCam)
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running on with Apache NiFi Node
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running on Edge Nodes (MiniFi)
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://github.com/tspannhw/mxnet_rpi
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-
running-apac.html
27 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Model Server with Apache NiFi
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
sudo pip3 install mxnet-model-server --upgrade
28 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running in Apache Zeppelin
29 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet on Apache YARN
https://github.com/tspannhw/nifi-mxnet-yarn
https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-
mxnet-on-apa.html
dmlc-submit --cluster yarn --num-workers 1 --server-cores 2
--server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
30 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache OpenNLP for Entity Resolution
Processor
https://github.com/tspannhw/nifi-nlp-
processor
Requires installation of NAR and Apache
OpenNLP Models
(http://opennlp.sourceforge.net/models-1.5/).
This is a non-supported processor that I wrote
and put into the community. You can write
one too!
Apache OpenNLP with Apache NiFi
https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Why TensorFlow? Also Apache MXNet,
PyTorch and DL4J.
• Google
• Multiple platform
support
• Hadoop integration
• Spark integration
• Keras
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Clustering
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Buzz
• Extensive Documentation
• Raspberry Pi Support
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics
Manager
Part of MiniFi C++ Agent
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Complex Event Processing
Joining DataSets for Streaming Analytics
Open Source Image Analytical Components
Enabling Record Processing
Schema Management
33 © Hortonworks Inc. 2011–2018. All rights reserved.
python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
bazel-bin/tensorflow/examples/label_image/label_image --
image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I
tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I
tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I
tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I
tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186
TensorFlow via Python or C++ Binary
34 © Hortonworks Inc. 2011–2018. All rights reserved.
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
TensorFlow Python Example – Classify Image
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime())
host = os.uname()[1]
35 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Python Classifier Launcher
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
#!/bin/bash
DATE=$(date +"%Y-%m-%d_%H%M")
fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg
python2 -W ignore /opt/demo/classify_image.py /opt/demo/images/$DATE.jpg 2>/dev/null
36 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Python Example – Classify Image
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
row = []
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
row.append( { 'node_id': node_id, 'image': image, 'host': host, 'ts': currenttime, 'human_string’:
str(human_string), 'score': str(score)} )
json_string = json.dumps(row)
print( json_string )
37 © Hortonworks Inc. 2011–2018. All rights reserved.
• TensorFlow (C++, Python, Java)
via ExecuteStreamCommand
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
Apache NiFi Integration with TensorFlow Options
38 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-
apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
https://community.hortonworks.com/articles/178498/integrating-tensorflow-
16-image-labelling-with-hdf.html
39 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
Installation On A Single Node of Apache NiFi 1.5+
Download NAR here: https://github.com/tspannhw/nifi-tensorflow-
processor/releases/tag/1.6
Install NAR file to /usr/hdf/current/nifi/lib/
Create a model directory (/opt/demo/models)
wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi-
tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt
wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow-
processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true
Restart Apache NiFi via Ambari
40 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
41 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Running on Edge Nodes (MiniFi)
CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image
STRING, ts STRING, host STRING, score STRING,
human_string STRING, node_id FLOAT) STORED AS ORC
LOCATION '/tfimage'
42 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Installation (Edge)
apt-get install curl wget –y
wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh
./bazel-0.11.1-installer-linux-x86_64.sh
apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py –y
pip3 install six numpy wheel
pip3 install --user numpy scipy matplotlib pandas sympy nose
pip3 install --upgrade tensorflow
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip
wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
43 © Hortonworks Inc. 2011–2018. All rights reserved.
Questions?
44 © Hortonworks Inc. 2011–2018. All rights reserved.
Contact
https://github.com/tspannhw/ApacheDeepLearning101
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn
https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks-
contributions
45 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
46 © Hortonworks Inc. 2011–2018. All rights reserved.
Community Engagement
Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved
4,000+
Registered Users
10,000+
Answers
15,000+
Technical Assets
One Website!

More Related Content

PDF
Apache MXNet for IoT with Apache NiFi
Timothy Spann
 
PDF
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann
 
PDF
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
Timothy Spann
 
PDF
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
Timothy Spann
 
PDF
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
Timothy Spann
 
PDF
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Timothy Spann
 
PDF
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
PPTX
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 
Apache MXNet for IoT with Apache NiFi
Timothy Spann
 
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann
 
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
Timothy Spann
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
Timothy Spann
 
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
Timothy Spann
 
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Timothy Spann
 
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
Cloud Operations with Streaming Analytics using Apache NiFi and Apache Flink
DataWorks Summit
 

What's hot (20)

PPTX
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
PDF
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit
 
PDF
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
PDF
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
PPTX
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
PPTX
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Deep Learning 201
DataWorks Summit
 
PDF
Running Zeppelin in Enterprise
DataWorks Summit
 
PDF
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PDF
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit
 
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan
 
Apache Nifi Crash Course
DataWorks Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
Apache Nifi Crash Course
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Apache Deep Learning 201
DataWorks Summit
 
Running Zeppelin in Enterprise
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Apache Nifi Crash Course
DataWorks Summit
 
Ad

Similar to Deep learning on HDP 2018 Prague (20)

PPTX
Apache deep learning 101
DataWorks Summit
 
PPTX
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
PDF
Hands-On Deep Dive with MiniFi and Apache MXNet
Timothy Spann
 
PDF
Apache deep learning 202 Washington DC - DWS 2019
Timothy Spann
 
PDF
Apache Deep Learning 201 - Barcelona DWS March 2019
Timothy Spann
 
PDF
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Timothy Spann
 
PDF
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
PPTX
SoCal BigData Day
John Park
 
PDF
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
PPTX
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
PDF
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
PDF
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
PDF
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
PPTX
Classification based security in Hadoop
Madhan Neethiraj
 
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
PDF
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
PDF
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
PDF
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
PDF
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Apache deep learning 101
DataWorks Summit
 
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
Hands-On Deep Dive with MiniFi and Apache MXNet
Timothy Spann
 
Apache deep learning 202 Washington DC - DWS 2019
Timothy Spann
 
Apache Deep Learning 201 - Barcelona DWS March 2019
Timothy Spann
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Timothy Spann
 
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
SoCal BigData Day
John Park
 
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Classification based security in Hadoop
Madhan Neethiraj
 
Is your Enterprise Data lake Metadata Driven AND Secure?
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 

Recently uploaded (20)

PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Doc9.....................................
SofiaCollazos
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 

Deep learning on HDP 2018 Prague

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Deep Learning on HDP Prague 2018 Timothy Spann, Solutions Engineer Hortonworks @PaaSDev
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Agenda • Data Engineering With Deep Learning • TensorFlow with Apache NiFi • TensorFlow on YARN • Apache MXNet Pre-Built Models • Apache MXNet Model Server With Apache NiFi • Apache MXNet in Apache Zeppelin Notebooks • Apache MXNet On YARN • Demos • Questions
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning for Big Data Engineers Multiple users, frameworks, languages, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of database query languages such as SQL • Knowledge of Hadoop tools such as Hive, or Pig • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding Skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Beat You At Q-Bert • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time http://gluon.mxnet.io/chapter01_crashcourse/preface.html
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Use Cases So Why Am I Orchestrating These Complex Deep Learning Workflows? Computer Vision • Object Recognition • Image Classification • Object Detection • Motion Estimation • Annotation • Visual Question and Answer • Autonomous Driving • Speech to Text • Speech Recognition • Chat Bot • Voice UI Speech Recognition Natural Language Processing • Sentiment Analysis • Text Classification • Named Entity Recognition https://github.com/zackchase/mxnet-the-straight-dope Recommender Systems • Content-based Recommendations
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options • TensorFlow (C++, Python, Java) • TensorFlow on Spark (Yahoo) • Caffe on Spark (Yahoo) • Apache MXNet (Baidu, Amazon, Nvidia, MS, CMU, NYU, intel) • Deep Learning 4 J (Skymind) JVM • PyTorch • H2o Deep Water • Keras ontop of TensorFlow and DL4J • Apache Singa • Caffe2 (Facebook)
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Recommendations • Install CPU Version on CPU YARN Nodes • Install GPU Version on Nvidia (CUDA) • Do training on GPU YARN Nodes where possible • Apply Model on All Nodes and Trigger with Apache NiFi • What helps Hadoop and Spark will help TensorFlow. More RAM, More and Faster Cores, More Nodes. • Today, Run either pure TensorFlow with Keras <or> TensorFlow on Spark. • Try YARN 3.0 Containerized TensorFlow later in the year. • Consider Alluxio or Apache Ignite for in-memory optimization • Download the model zoos • Evaluate other Deep Learning Frameworks like MXNet and PyTorch
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow on Hadoop https://www.tensorflow.org/deploy/hadoop HDFS files can be used as a distributed source for input producers for training, allowing one fast cluster to Store these massive datasets and share them amongst your cluster. This requires setting a few environment variables: JAVA_HOME HADOOP_HDFS_HOME LD_LIBRARY_PATH CLASSPATH
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Serving on YARN 3.0 https://github.com/NVIDIA/nvidia-docker We use NVIDIA Docker containers on top of YARN
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Run TensorFlow on YARN 3.0 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Flow Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks Inc. 2011 Streaming Apache Deep Learning Page 13 Data Acquisition Edge Processing Deep Learning Real Time Stream Analytics Rapid Application Development IoT ANALYTICS CLOUD Acquire Move Routing & Filtering Deliver Parse Analysis Aggregation Modeling
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Components Streaming Analytics Manager Machine Learning Distributed queue Buffering Process decoupling Streaming and SQL Orchestration Queueing Simple Event Processing REST API Secure Spark Execution
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Run everywhere Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Entity Resolution Natural Language Processing Apache Deep Learning Components
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. http://mxnet.incubator.apache.org/ • Cloud ready • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel backing • Apache Incubator Project • Run distributed on YARN • In my early tests, faster than TensorFlow. • Runs on Raspberry PI, Nvidia Jetson TX1 and other constrained devices • Great documentation • Gluon • Great Python Interaction • Model Server Available • ONNX Support • Now in Version 1.1! • Great Model Zoo https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.1.0/example
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Architecture HDP Node X Node Manager Datanode HBase Region HDP Node Y Node Manager Datanode HBase Region HDF Node Apache NiFi Zookeeper Apache Spark MLib Apache Spark MLib GPU Node Neural Network Apache Spark MLib Apache Spark MLib Pipeline GPU Node Neural Network Pipeline MiNiFi Java Agent MiNiFi C++ Agent HDF Node Apache NiFi Zookeeper Apache Livy
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. What do we want to do? • MiniFi ingests camera images and sensor data • MiniFi executes Apache MXNet at the edge • Run Apache MXNet Inception to recognize objects in image • Apache NiFi stores images, metadata and enriched data in Hadoop • Apache NiFi ingests social data and REST feeds • Apache OpenNLP and Apache Tika for textual data
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Aggregate all data from sensors, drones, logs, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, sentiment analysis, image analysis, object detection, image recognition, voice recognition with Apache Tika, Apache OpenNLP and Apache MXNet. Curate: Gain Insights
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. • Apache MXNet via Execute Process (Python) • Apache MXNet Running on Edge Nodes (MiniFi) S2S • Apache MXNet Model Server Integration (REST API) Not Covered Today • *Dockerized Apache MXNet on Hadoop YARN 3 with NVidia GPU • *Apache MXNet on Spark Apache NiFi Integration with Apache MXNet Options
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. • https://github.com/apache/incubator-mxnet/tree/master/tools/coreml • https://github.com/Leliana/WhatsThis • https://github.com/apache/incubator-mxnet/tree/master/amalgamation/jni • https://hub.docker.com/r/mxnet/ • https://github.com/apache/incubator-mxnet/tree/master/scala- package/spark Other Options
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Pre-Built Models • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG19 • ResidualNet 152 • LSTM http://mxnet.incubator.apache.org/model_zoo/index.html https://github.com/dmlc/mxnet-model-gallery
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. python3 -W ignore analyze.py {"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1": "n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2": "n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3": "n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt", "top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename": "images/tx1_image_img_20180208204131.jpg", "runtime": "2"} Apache MXNet via Python (OSX Local with WebCam)
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on with Apache NiFi Node
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on Edge Nodes (MiniFi) https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://github.com/tspannhw/mxnet_rpi https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1- running-apac.html
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Model Server with Apache NiFi https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html sudo pip3 install mxnet-model-server --upgrade
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running in Apache Zeppelin
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet on Apache YARN https://github.com/tspannhw/nifi-mxnet-yarn https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache- mxnet-on-apa.html dmlc-submit --cluster yarn --num-workers 1 --server-cores 2 --server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Apache OpenNLP for Entity Resolution Processor https://github.com/tspannhw/nifi-nlp- processor Requires installation of NAR and Apache OpenNLP Models (http://opennlp.sourceforge.net/models-1.5/). This is a non-supported processor that I wrote and put into the community. You can write one too! Apache OpenNLP with Apache NiFi https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Why TensorFlow? Also Apache MXNet, PyTorch and DL4J. • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Part of MiniFi C++ Agent Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Complex Event Processing Joining DataSets for Streaming Analytics Open Source Image Analytical Components Enabling Record Processing Schema Management
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime()) host = os.uname()[1]
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Classifier Launcher https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py #!/bin/bash DATE=$(date +"%Y-%m-%d_%H%M") fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg python2 -W ignore /opt/demo/classify_image.py /opt/demo/images/$DATE.jpg 2>/dev/null
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py row = [] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] row.append( { 'node_id': node_id, 'image': image, 'host': host, 'ts': currenttime, 'human_string’: str(human_string), 'score': str(score)} ) json_string = json.dumps(row) print( json_string )
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in- apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor https://community.hortonworks.com/articles/178498/integrating-tensorflow- 16-image-labelling-with-hdf.html
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi Installation On A Single Node of Apache NiFi 1.5+ Download NAR here: https://github.com/tspannhw/nifi-tensorflow- processor/releases/tag/1.6 Install NAR file to /usr/hdf/current/nifi/lib/ Create a model directory (/opt/demo/models) wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi- tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow- processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true Restart Apache NiFi via Ambari
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Running on Edge Nodes (MiniFi) CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id FLOAT) STORED AS ORC LOCATION '/tfimage'
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Installation (Edge) apt-get install curl wget –y wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh ./bazel-0.11.1-installer-linux-x86_64.sh apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py –y pip3 install six numpy wheel pip3 install --user numpy scipy matplotlib pandas sympy nose pip3 install --upgrade tensorflow git clone --recurse-submodules https://github.com/tensorflow/tensorflow wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. Questions?
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. Contact https://github.com/tspannhw/ApacheDeepLearning101 https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks- contributions
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved. Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!