SlideShare a Scribd company logo
Deep Learning Applications Design, Development and
Deployment in IoT Edge
Jayakumar. S PhD (IIT Bombay)
Object Automation, ​04/02/2020
Contents
Introduction 3
Power 9 based DL training 4
2.1 HW and SW Requirements 4
2.1.1 Talos™ II Entry-Level Developer System 4
2.1.2 Software 4
2.2 “DLtrain” resources in github 5
2.3 Build Tool Set 5
2.4 Build Process to Create “DLtrain” 5
2.5 Use DLtrain to train ANN model 6
2.6 Use DLtrain to Inferencing 6
Working with CUDA core 7
3.1 Hardware Used in CUDA computing 7
3.2 Driver Software Installation in Power AC922 7
3.3 Single Project using Power AC922 and GPU 8
3.3.1 Build Application 8
3.3.2 Sample Code in cu 9
Inference app in Android phone 13
4.1 Install NDK in Android Studio 13
4.2 Build Inference Engine 14
Update Inference Engine with Trained Model 15
Question: How to build toPhone/SndModel.jar ? 15
Question: How to use toPhone/SndModel.jar ? 15
Working with ML in Watson Studio 17
Visual Recognition in Watson Studio 17
Deploy Visual Recognition 17
Visual Recognition Client in Android Phone 19
1.Introduction
Intelligence IoT Edge is playing a critical role in services that require real time
inferencing. Historically, there have been systems with a high amount of
engineering complexity in terms of deployment and also in operation. For
example, SCADA is one such a system that has been working in the Power
Generation industry, Oil and Gas industry, Cement factories etc. In fact, SCADA
includes humans in loop and making it as Supervisory control and Data
acquisition. In the advent of Deep learning and its success in the modern Digital
side, there have been huge amounts of interest among researchers to carry
Deep learning Models to above mentioned industrial verticals and trying to bring
up Intelligent control and Data acquisition. In the place of Supervisor, it appears
that intelligent IoT edge coming up to perform those tasks that are handled by
Human beings in the form of Supervisor. Thus there is immense interest in
making IoT Edge as intelligent systems in these core engineering verticals apart
from consumer industry requirements. Lecture series designed to include NN
models, Train a NN model with training data ( mostly use MNIST ) and validate
trained NN models before going for deployment in IoT Edge. In the case of
deployment, there is a huge interest in making SmartPhones as IoT Edge such
that the same device can be used without much investment during the learning
time of each learner. However, in the case of industrial deployment are expected
to happen in devices like Jetson Nano, Ultra96-V2, mmWave Radar IWR 6843
etc. Maybe, as an advanced lecture series, there is a plan to cover the above
mentioned IoT Edges.
2.Power 9 based DL training
Along with Compute capability, it is important to have Ultra high speed I/O capability to
share training data with GPUs plus other Power AC922. Artificial Neural Network (ANN)
Model is created to perform classification of given image. The MNIST data set is used to
Train ANN models by using AC922.
2.1 HW and SW Requirements
2.1.1 ​Talos™ II Entry-Level Developer System
TLSDS3
Talos™ II Entry-Level Developer System including:
● EATX chassis with 500W ATX power supply
● A single Talos™ II Lite EATX mainboard
● One 4-core IBM POWER9 CPU
○ 4 cores per package
○ SMT4 capable
○ POWER IOMMU
○ Hardware virtualization extensions
○ 90W TDP
● One 3U POWER9 heatsink / fan (HSF) assembly
● 8GB DDR4 ECC registered memory
● 2 front panel USB 2.0 ports
● 128GB internal NVMe storage
● Recovery DV​D
https://www.raptorcs.com/content/TLSDS3/intro.html
2.1.2 ​Software
1. Ubuntu 18.04,
2. cmake version 3.10.2 ,
3. gcc -9 , g++9,
2.2 “DLtrain” resources in github
https://github.com/DLinIoTedge/DLtrain provides ready use applications to train ANN
based Deep Learning Model.
Deep Learning training application is coded in C and C++.
2.3 Build Tool Set
Ubuntu 18.04 based g++, gcc tool set used to ( cmake also used to create Make file) to
create DL application that is running in Power AC922.
2.4 Build Process to Create “DLtrain”
Making “DLtrain” application in Ubuntu 18.04 ( Power AC922 ) machine with
gcc-9 g++-9
​cd C++NNFast
rm -rf build
cd build
​ cmake -D CMAKE_C_COMPILER=gcc-9 -D CMAKE_CXX_COMPILER=g++-9 ..
make
2.5 Use DLtrain to train ANN model
Using, “DLtrain” application to train ANN models by using the MNIST data set.
​bin/DLtrain conf train ​// Train DL model
​bin/DLtrain conf train o ​// to overwrite tranined model
2.6 Use DLtrain to Inferencing
Using, “DLtrain” application to infer handwritten number numbers and stored in
28x28 pixel size images.
​ bin/DLtrain conf infer 3 ​// only all 3 from data set
bin/DLtrain conf infer ​<filename>​ ​ // only raw ...768 bytes... binary value
bin/DLtrain conf infer img.raw ​// sample file with number 4 in img.raw
3. Working with CUDA core
3.1 Hardware Used in CUDA computing
Hardware from Nvidia :​ GeForce RTX 2070
● GPU Architecture ​Turing
● NVIDIA CUDA​®​
Cores 2304
● RTX-OPS ​42T
● Boost Clock ​1620 MHz
● Frame Buffer ​8 GB GDDR6
● Memory Speed ​14 Gbps
3.2 Driver Software Installation in Power AC922
Get CUDA 10.2 for ubuntu 18.04 on ppc64le from the following link. Use wget to obtain
“cuda_10.2.89_440.33.01_linux_ppc64le.run”
http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.
2.89_440.33.01_linux_ppc64le.run
Install driver by using following command.
sudo sh cuda_10.2.89_440.33.01_linux_ppc64le.run
After above command, use nvidia-smi to get details on installed driver.
3.3 Single Project using Power AC922 and GPU
3.3.1 Build Application
File with *.cu extension is used to build application that runs partly in AC922 and also
partly in CUDA core of GPU
For GPU
nvcc is NVIDIA (R) Cuda compiler driver and its version “Cuda compilation tools,
release 10.1, V10.1.243 “
For Host CPU
g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Use make ( gnu make 4.1 is used) tool to build applications.
jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ make
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61
-gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75
-gencode arch=compute_75,code=compute_75 -o vectorAdd vectorAdd.o
mkdir -p ../../bin/ppc64le/linux/release
cp vectorAdd ../../bin/ppc64le/linux/release
Above make process creating “​vectorAdd” and the same is executed as given below.
jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ ​./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
3.3.2 Sample Code in cu
In the following sample source code is given.
#include <stdio.h>
// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>
#include <helper_cuda.h>
/**
* CUDA Kernel Device code
*
* Computes the vector addition of A and B into C. The 3 vectors have the same
* number of elements numElements.
*/
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
C[i] = A[i] + B[i];
}
}
/**
* Host main routine
*/
int
main(void)
{
// Error code to check return values for CUDA calls
cudaError_t err = cudaSuccess;
// Print the vector length to be used, and compute its size
int numElements = 50000;
size_t size = numElements * sizeof(float);
printf("[Vector addition of %d elements]n", numElements);
// Allocate the host input vector A
float *h_A = (float *)malloc(size);
// Allocate the host input vector B
float *h_B = (float *)malloc(size);
// Allocate the host output vector C
float *h_C = (float *)malloc(size);
// Verify that allocations succeeded
if (h_A == NULL || h_B == NULL || h_C == NULL)
{
fprintf(stderr, "Failed to allocate host vectors!n");
exit(EXIT_FAILURE);
}
// Initialize the host input vectors
for (int i = 0; i < numElements; ++i)
{
h_A[i] = rand()/(float)RAND_MAX;
h_B[i] = rand()/(float)RAND_MAX;
}
// Allocate the device input vector A
float *d_A = NULL;
err = cudaMalloc((void **)&d_A, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector A (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Allocate the device input vector B
float *d_B = NULL;
err = cudaMalloc((void **)&d_B, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector B (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Allocate the device output vector C
float *d_C = NULL;
err = cudaMalloc((void **)&d_C, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector C (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Copy the host input vectors A and B in host memory to the device input vectors in
// device memory
printf("Copy input data from the host memory to the CUDA devicen");
err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Launch the Vector Add CUDA Kernel
int threadsPerBlock = 256;
int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
printf("CUDA kernel launch with %d blocks of %d threadsn", blocksPerGrid,
threadsPerBlock);
vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);
err = cudaGetLastError();
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Copy the device result vector in device memory to the host result vector
// in host memory.
printf("Copy output data from the CUDA device to the host memoryn");
err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Verify that the result vector is correct
for (int i = 0; i < numElements; ++i)
{
if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5)
{
fprintf(stderr, "Result verification failed at element %d!n", i);
exit(EXIT_FAILURE);
}
}
printf("Test PASSEDn");
// Free device global memory
err = cudaFree(d_A);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector A (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaFree(d_B);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector B (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaFree(d_C);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector C (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Free host memory
free(h_A);
free(h_B);
free(h_C);
printf("Donen");
return 0;
}
4.Inference app in Android phone
4.1 Install NDK in Android Studio
ANN model based inference engine is coded in C++ and C. To make a library by using
inference engine code, there is a need to have NDK also installed in Android studio. In
the following installation of Android studio and also installation of NDK are given in
detail. Following is installed in Ubuntu 18.04 x86 machine.
Installation of dependencies
Android Studio requires OpenJDK version 8 or above to be installed to your system
sudo apt update
sudo apt install openjdk-8-jdk
java -version
Install Android Studio
sudo snap install android-studio --classic
start Android Studio either by typing android-studio in your terminal or by clicking on
the Android Studio icon (Activities -> Android Studio).
SDK required version
Use​ SDK 22 or above.
What is used in present build of J722 version of App is ​SDK 29
Install NDK
Use SDK manager to install the following components ( NDK). these components useful to build
JNI part of Inference engine
Packages to install:
- ​LLDB 3.1 (lldb;3.1)
- CMake 3.10.2.4988404 (cmake;3.10.2.4988404)
- NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944)
Reference:
https://linuxize.com/post/how-to-install-android-studio-on-ubuntu-18-04/
4.2 Build Inference Engine
Using Android studio build Inference engine as given in the following workflow and copy
the same APK into Android phone .
Windows 10 or Ubuntu 18.04 with Android Studio ( the latest stable version of Android
Studio is version 3.3.1.0.) is used. NDK (Side by side) 20.1.5948944
(ndk;20.1.5948944) is used to handle JNI side lib creation for Inference Engine,
Inference engine full source code is given in “ ​https://github.com/DLinIoTedge/NN​ “
Following diagram provides information on workflow to create J722 application in the
form of APK that can run in Android phone.
5.Update Inference Engine with Trained Model
Model update application source code is given in this URL
https://github.com/DLinIoTedge/Send2Phone
Following IP network configuration is recommended
Send2Phone source code in JAVA and it is running in Power machine or in X86
machine.
Question: How to build toPhone/SndModel.jar ?
javac main.java
Question: How to use toPhone/SndModel.jar ?
java -jar toPhone/SndModel.jar
And also follow the above given workflow in Host CPU and also in Android Phone. Successfully
operations in the above result in “deployment of trained model” in Inference engine that runs in
Android smartphone.
6. Working with ML in Watson Studio
Details are given in the following link
ai-ml-watson.pdf
7. Visual Recognition in Watson Studio
Details are given in the following link
A beginner's guide to setting up a visual recognition service
8. Deploy Visual Recognition
// following worked
curl -X POST -u "apikey:​put your API key here​" -F "images_file=@sasi.jpeg" -F "threshold=0.6" -F
"classifier_ids=DefaultCustomModel_1738357304"
"https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"
{
"images": [
{
"classifiers": [
{
"classifier_id": "DefaultCustomModel_1738357304",
"name": "Default Custom Model",
"classes": [
{
"class": "lb1compress.zip",
"score": 0.855
}
]
}
],
"image": "sasi.jpeg"
}
],
"images_processed": 1,
"custom_classes": 3
}
Sample code for Client Application
…………………………. Pyrhan code ……………
pip
pip install --upgrade "watson-developer-cloud>=2.1.1"
Authentication
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
version='{version}',
iam_apikey='{iam_api_key}'
)
Authentication (for instances created before May 23, 2018)
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
version='{version}',
api_key='{api_key}'
)
Classify an image
import json
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
'2018-03-19',
iam_apikey='{iam_api_key}')
with open('./fruitbowl.jpg', 'rb') as images_file:
classes = visual_recognition.classify(
images_file,
threshold='0.6',
classifier_ids='DefaultCustomModel_967878440').get_result()
print(json.dumps(classes, indent=2))
…………………………………………………………
9. Visual Recognition Client in Android Phone
Details are given in the following link
VR
///////////////////////////////////////////// ​document ends​ //////////////////////////////////////////////////////

More Related Content

PDF
Optimizing NN inference performance on Arm NEON and Vulkan
ax inc.
 
PPTX
Webinar: Building Embedded Applications from QtCreator with Docker
Burkhard Stubert
 
PDF
Open CL For Haifa Linux Club
Ofer Rosenberg
 
PPT
Cuda intro
Anshul Sharma
 
PDF
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
PDF
Getting Native with NDK
ナム-Nam Nguyễn
 
PDF
CUDA by Example : Introduction to CUDA C : Notes
Subhajit Sahu
 
PDF
CUDA by Example : Parallel Programming in CUDA C : Notes
Subhajit Sahu
 
Optimizing NN inference performance on Arm NEON and Vulkan
ax inc.
 
Webinar: Building Embedded Applications from QtCreator with Docker
Burkhard Stubert
 
Open CL For Haifa Linux Club
Ofer Rosenberg
 
Cuda intro
Anshul Sharma
 
Cloud Deep Learning Chips Training & Inference
Mr. Vengineer
 
Getting Native with NDK
ナム-Nam Nguyễn
 
CUDA by Example : Introduction to CUDA C : Notes
Subhajit Sahu
 
CUDA by Example : Parallel Programming in CUDA C : Notes
Subhajit Sahu
 

What's hot (20)

PDF
A Deep Dive into QtCanBus
Burkhard Stubert
 
PPT
Cuda 2011
coolmirza143
 
PDF
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 
PDF
Introduction to OpenCL
Unai Lopez-Novoa
 
PPTX
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
PDF
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
AMD Developer Central
 
PDF
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
Tsukasa Sugiura
 
PPT
Developing new zynq based instruments
Graham NAYLOR
 
PDF
Cuda introduction
Hanibei
 
PDF
Cuda
Gopi Saiteja
 
PDF
OpenCL Programming 101
Yoss Cohen
 
PDF
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
PDF
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
AMD Developer Central
 
PDF
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
PDF
xapp744-HIL-Zynq-7000
Umang Parekh
 
PPT
Introduction to parallel computing using CUDA
Martin Peniak
 
PPTX
Cuda Architecture
Piyush Mittal
 
DOC
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Satish Kumar
 
PDF
Introduction to CUDA
Raymond Tay
 
PDF
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 
A Deep Dive into QtCanBus
Burkhard Stubert
 
Cuda 2011
coolmirza143
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 
Introduction to OpenCL
Unai Lopez-Novoa
 
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
AMD Developer Central
 
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
Tsukasa Sugiura
 
Developing new zynq based instruments
Graham NAYLOR
 
Cuda introduction
Hanibei
 
Cuda
Gopi Saiteja
 
OpenCL Programming 101
Yoss Cohen
 
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
AMD Developer Central
 
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
xapp744-HIL-Zynq-7000
Umang Parekh
 
Introduction to parallel computing using CUDA
Martin Peniak
 
Cuda Architecture
Piyush Mittal
 
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Satish Kumar
 
Introduction to CUDA
Raymond Tay
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 
Ad

Similar to Deep Learning Edge (20)

PPTX
introduction to CUDA_C.pptx it is widely used
Himanshu577858
 
PPTX
Intro to GPGPU Programming with Cuda
Rob Gillen
 
PDF
NVIDIA cuda programming, open source and AI
Tae wook kang
 
PDF
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
JooinK
 
PPTX
Introduction_to_CUDA_C_simple et parfiat.pptx
YoussefHakam2
 
PDF
Build and run embedded apps faster from qt creator with docker
Qt
 
PPTX
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Windows Developer
 
PPTX
Lab Handson: Power your Creations with Intel Edison!
Codemotion
 
PDF
Programming IoT with Docker: How to Start?
msyukor
 
PPTX
VLSI Training presentation
Daola Khungur
 
PPTX
What should you know about Net Core?
Damir Dobric
 
PDF
Speeding up Programs with OpenACC in GCC
inside-BigData.com
 
PPT
Ese 2008 RTSC Draft1
drusso
 
PPT
generate IP CORES
guest296013
 
PDF
Android on IA devices and Intel Tools
Xavier Hallade
 
PPT
Vpu technology &gpgpu computing
Arka Ghosh
 
PPT
Vpu technology &gpgpu computing
Arka Ghosh
 
PPT
Vpu technology &gpgpu computing
Arka Ghosh
 
PPT
Vpu technology &gpgpu computing
Arka Ghosh
 
PPTX
Scaling Docker Containers using Kubernetes and Azure Container Service
Ben Hall
 
introduction to CUDA_C.pptx it is widely used
Himanshu577858
 
Intro to GPGPU Programming with Cuda
Rob Gillen
 
NVIDIA cuda programming, open source and AI
Tae wook kang
 
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
JooinK
 
Introduction_to_CUDA_C_simple et parfiat.pptx
YoussefHakam2
 
Build and run embedded apps faster from qt creator with docker
Qt
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Windows Developer
 
Lab Handson: Power your Creations with Intel Edison!
Codemotion
 
Programming IoT with Docker: How to Start?
msyukor
 
VLSI Training presentation
Daola Khungur
 
What should you know about Net Core?
Damir Dobric
 
Speeding up Programs with OpenACC in GCC
inside-BigData.com
 
Ese 2008 RTSC Draft1
drusso
 
generate IP CORES
guest296013
 
Android on IA devices and Intel Tools
Xavier Hallade
 
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Arka Ghosh
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Ben Hall
 
Ad

More from Ganesan Narayanasamy (20)

PDF
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Ganesan Narayanasamy
 
PDF
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
PDF
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
PDF
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
PDF
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
PDF
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
 
PDF
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
PDF
IBM BOA for POWER
Ganesan Narayanasamy
 
PDF
OpenPOWER System Marconi100
Ganesan Narayanasamy
 
PDF
OpenPOWER Latest Updates
Ganesan Narayanasamy
 
PDF
POWER10 innovations for HPC
Ganesan Narayanasamy
 
PDF
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
 
PDF
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
PDF
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
PDF
AI in healthcare - Use Cases
Ganesan Narayanasamy
 
PDF
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
PDF
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
PDF
Poster from NUS
Ganesan Narayanasamy
 
PDF
SAP HANA on POWER9 systems
Ganesan Narayanasamy
 
PPTX
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Empowering Engineering Faculties: Bridging the Gap with Emerging Technologies
Ganesan Narayanasamy
 
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
IBM BOA for POWER
Ganesan Narayanasamy
 
OpenPOWER System Marconi100
Ganesan Narayanasamy
 
OpenPOWER Latest Updates
Ganesan Narayanasamy
 
POWER10 innovations for HPC
Ganesan Narayanasamy
 
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
AI in healthcare - Use Cases
Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
Poster from NUS
Ganesan Narayanasamy
 
SAP HANA on POWER9 systems
Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 

Recently uploaded (20)

PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
The Future of Artificial Intelligence (AI)
Mukul
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 

Deep Learning Edge

  • 1. Deep Learning Applications Design, Development and Deployment in IoT Edge Jayakumar. S PhD (IIT Bombay) Object Automation, ​04/02/2020
  • 2. Contents Introduction 3 Power 9 based DL training 4 2.1 HW and SW Requirements 4 2.1.1 Talos™ II Entry-Level Developer System 4 2.1.2 Software 4 2.2 “DLtrain” resources in github 5 2.3 Build Tool Set 5 2.4 Build Process to Create “DLtrain” 5 2.5 Use DLtrain to train ANN model 6 2.6 Use DLtrain to Inferencing 6 Working with CUDA core 7 3.1 Hardware Used in CUDA computing 7 3.2 Driver Software Installation in Power AC922 7 3.3 Single Project using Power AC922 and GPU 8 3.3.1 Build Application 8 3.3.2 Sample Code in cu 9 Inference app in Android phone 13 4.1 Install NDK in Android Studio 13 4.2 Build Inference Engine 14 Update Inference Engine with Trained Model 15 Question: How to build toPhone/SndModel.jar ? 15 Question: How to use toPhone/SndModel.jar ? 15 Working with ML in Watson Studio 17 Visual Recognition in Watson Studio 17 Deploy Visual Recognition 17 Visual Recognition Client in Android Phone 19
  • 3. 1.Introduction Intelligence IoT Edge is playing a critical role in services that require real time inferencing. Historically, there have been systems with a high amount of engineering complexity in terms of deployment and also in operation. For example, SCADA is one such a system that has been working in the Power Generation industry, Oil and Gas industry, Cement factories etc. In fact, SCADA includes humans in loop and making it as Supervisory control and Data acquisition. In the advent of Deep learning and its success in the modern Digital side, there have been huge amounts of interest among researchers to carry Deep learning Models to above mentioned industrial verticals and trying to bring up Intelligent control and Data acquisition. In the place of Supervisor, it appears that intelligent IoT edge coming up to perform those tasks that are handled by Human beings in the form of Supervisor. Thus there is immense interest in making IoT Edge as intelligent systems in these core engineering verticals apart from consumer industry requirements. Lecture series designed to include NN models, Train a NN model with training data ( mostly use MNIST ) and validate trained NN models before going for deployment in IoT Edge. In the case of deployment, there is a huge interest in making SmartPhones as IoT Edge such that the same device can be used without much investment during the learning time of each learner. However, in the case of industrial deployment are expected to happen in devices like Jetson Nano, Ultra96-V2, mmWave Radar IWR 6843 etc. Maybe, as an advanced lecture series, there is a plan to cover the above mentioned IoT Edges.
  • 4. 2.Power 9 based DL training Along with Compute capability, it is important to have Ultra high speed I/O capability to share training data with GPUs plus other Power AC922. Artificial Neural Network (ANN) Model is created to perform classification of given image. The MNIST data set is used to Train ANN models by using AC922. 2.1 HW and SW Requirements 2.1.1 ​Talos™ II Entry-Level Developer System TLSDS3 Talos™ II Entry-Level Developer System including: ● EATX chassis with 500W ATX power supply ● A single Talos™ II Lite EATX mainboard ● One 4-core IBM POWER9 CPU ○ 4 cores per package ○ SMT4 capable ○ POWER IOMMU ○ Hardware virtualization extensions ○ 90W TDP ● One 3U POWER9 heatsink / fan (HSF) assembly ● 8GB DDR4 ECC registered memory ● 2 front panel USB 2.0 ports ● 128GB internal NVMe storage ● Recovery DV​D https://www.raptorcs.com/content/TLSDS3/intro.html 2.1.2 ​Software 1. Ubuntu 18.04, 2. cmake version 3.10.2 , 3. gcc -9 , g++9,
  • 5. 2.2 “DLtrain” resources in github https://github.com/DLinIoTedge/DLtrain provides ready use applications to train ANN based Deep Learning Model. Deep Learning training application is coded in C and C++. 2.3 Build Tool Set Ubuntu 18.04 based g++, gcc tool set used to ( cmake also used to create Make file) to create DL application that is running in Power AC922. 2.4 Build Process to Create “DLtrain” Making “DLtrain” application in Ubuntu 18.04 ( Power AC922 ) machine with gcc-9 g++-9 ​cd C++NNFast rm -rf build cd build ​ cmake -D CMAKE_C_COMPILER=gcc-9 -D CMAKE_CXX_COMPILER=g++-9 .. make
  • 6. 2.5 Use DLtrain to train ANN model Using, “DLtrain” application to train ANN models by using the MNIST data set. ​bin/DLtrain conf train ​// Train DL model ​bin/DLtrain conf train o ​// to overwrite tranined model 2.6 Use DLtrain to Inferencing Using, “DLtrain” application to infer handwritten number numbers and stored in 28x28 pixel size images. ​ bin/DLtrain conf infer 3 ​// only all 3 from data set bin/DLtrain conf infer ​<filename>​ ​ // only raw ...768 bytes... binary value bin/DLtrain conf infer img.raw ​// sample file with number 4 in img.raw
  • 7. 3. Working with CUDA core 3.1 Hardware Used in CUDA computing Hardware from Nvidia :​ GeForce RTX 2070 ● GPU Architecture ​Turing ● NVIDIA CUDA​®​ Cores 2304 ● RTX-OPS ​42T ● Boost Clock ​1620 MHz ● Frame Buffer ​8 GB GDDR6 ● Memory Speed ​14 Gbps 3.2 Driver Software Installation in Power AC922 Get CUDA 10.2 for ubuntu 18.04 on ppc64le from the following link. Use wget to obtain “cuda_10.2.89_440.33.01_linux_ppc64le.run” http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10. 2.89_440.33.01_linux_ppc64le.run Install driver by using following command. sudo sh cuda_10.2.89_440.33.01_linux_ppc64le.run After above command, use nvidia-smi to get details on installed driver.
  • 8. 3.3 Single Project using Power AC922 and GPU 3.3.1 Build Application File with *.cu extension is used to build application that runs partly in AC922 and also partly in CUDA core of GPU For GPU nvcc is NVIDIA (R) Cuda compiler driver and its version “Cuda compilation tools, release 10.1, V10.1.243 “ For Host CPU g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 Use make ( gnu make 4.1 is used) tool to build applications. jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ make /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o vectorAdd vectorAdd.o mkdir -p ../../bin/ppc64le/linux/release cp vectorAdd ../../bin/ppc64le/linux/release Above make process creating “​vectorAdd” and the same is executed as given below. jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ ​./vectorAdd [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
  • 9. 3.3.2 Sample Code in cu In the following sample source code is given. #include <stdio.h> // For the CUDA runtime routines (prefixed with "cuda_") #include <cuda_runtime.h> #include <helper_cuda.h> /** * CUDA Kernel Device code * * Computes the vector addition of A and B into C. The 3 vectors have the same * number of elements numElements. */ __global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < numElements) { C[i] = A[i] + B[i]; } } /** * Host main routine */ int main(void) { // Error code to check return values for CUDA calls cudaError_t err = cudaSuccess; // Print the vector length to be used, and compute its size int numElements = 50000; size_t size = numElements * sizeof(float); printf("[Vector addition of %d elements]n", numElements); // Allocate the host input vector A float *h_A = (float *)malloc(size); // Allocate the host input vector B float *h_B = (float *)malloc(size);
  • 10. // Allocate the host output vector C float *h_C = (float *)malloc(size); // Verify that allocations succeeded if (h_A == NULL || h_B == NULL || h_C == NULL) { fprintf(stderr, "Failed to allocate host vectors!n"); exit(EXIT_FAILURE); } // Initialize the host input vectors for (int i = 0; i < numElements; ++i) { h_A[i] = rand()/(float)RAND_MAX; h_B[i] = rand()/(float)RAND_MAX; } // Allocate the device input vector A float *d_A = NULL; err = cudaMalloc((void **)&d_A, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector A (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device input vector B float *d_B = NULL; err = cudaMalloc((void **)&d_B, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector B (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device output vector C float *d_C = NULL; err = cudaMalloc((void **)&d_C, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector C (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE);
  • 11. } // Copy the host input vectors A and B in host memory to the device input vectors in // device memory printf("Copy input data from the host memory to the CUDA devicen"); err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Launch the Vector Add CUDA Kernel int threadsPerBlock = 256; int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock; printf("CUDA kernel launch with %d blocks of %d threadsn", blocksPerGrid, threadsPerBlock); vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements); err = cudaGetLastError(); if (err != cudaSuccess) { fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Copy the device result vector in device memory to the host result vector // in host memory. printf("Copy output data from the CUDA device to the host memoryn"); err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE);
  • 12. } // Verify that the result vector is correct for (int i = 0; i < numElements; ++i) { if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5) { fprintf(stderr, "Result verification failed at element %d!n", i); exit(EXIT_FAILURE); } } printf("Test PASSEDn"); // Free device global memory err = cudaFree(d_A); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector A (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_B); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector B (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_C); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector C (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Free host memory free(h_A); free(h_B); free(h_C); printf("Donen"); return 0; }
  • 13. 4.Inference app in Android phone 4.1 Install NDK in Android Studio ANN model based inference engine is coded in C++ and C. To make a library by using inference engine code, there is a need to have NDK also installed in Android studio. In the following installation of Android studio and also installation of NDK are given in detail. Following is installed in Ubuntu 18.04 x86 machine. Installation of dependencies Android Studio requires OpenJDK version 8 or above to be installed to your system sudo apt update sudo apt install openjdk-8-jdk java -version Install Android Studio sudo snap install android-studio --classic start Android Studio either by typing android-studio in your terminal or by clicking on the Android Studio icon (Activities -> Android Studio). SDK required version Use​ SDK 22 or above. What is used in present build of J722 version of App is ​SDK 29 Install NDK Use SDK manager to install the following components ( NDK). these components useful to build JNI part of Inference engine Packages to install: - ​LLDB 3.1 (lldb;3.1) - CMake 3.10.2.4988404 (cmake;3.10.2.4988404) - NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944) Reference: https://linuxize.com/post/how-to-install-android-studio-on-ubuntu-18-04/
  • 14. 4.2 Build Inference Engine Using Android studio build Inference engine as given in the following workflow and copy the same APK into Android phone . Windows 10 or Ubuntu 18.04 with Android Studio ( the latest stable version of Android Studio is version 3.3.1.0.) is used. NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944) is used to handle JNI side lib creation for Inference Engine, Inference engine full source code is given in “ ​https://github.com/DLinIoTedge/NN​ “ Following diagram provides information on workflow to create J722 application in the form of APK that can run in Android phone.
  • 15. 5.Update Inference Engine with Trained Model Model update application source code is given in this URL https://github.com/DLinIoTedge/Send2Phone Following IP network configuration is recommended Send2Phone source code in JAVA and it is running in Power machine or in X86 machine. Question: How to build toPhone/SndModel.jar ? javac main.java Question: How to use toPhone/SndModel.jar ? java -jar toPhone/SndModel.jar
  • 16. And also follow the above given workflow in Host CPU and also in Android Phone. Successfully operations in the above result in “deployment of trained model” in Inference engine that runs in Android smartphone.
  • 17. 6. Working with ML in Watson Studio Details are given in the following link ai-ml-watson.pdf 7. Visual Recognition in Watson Studio Details are given in the following link A beginner's guide to setting up a visual recognition service 8. Deploy Visual Recognition // following worked curl -X POST -u "apikey:​put your API key here​" -F "[email protected]" -F "threshold=0.6" -F "classifier_ids=DefaultCustomModel_1738357304" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19" { "images": [ { "classifiers": [ { "classifier_id": "DefaultCustomModel_1738357304", "name": "Default Custom Model", "classes": [ { "class": "lb1compress.zip", "score": 0.855
  • 18. } ] } ], "image": "sasi.jpeg" } ], "images_processed": 1, "custom_classes": 3 } Sample code for Client Application …………………………. Pyrhan code …………… pip pip install --upgrade "watson-developer-cloud>=2.1.1" Authentication from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( version='{version}', iam_apikey='{iam_api_key}' ) Authentication (for instances created before May 23, 2018) from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( version='{version}', api_key='{api_key}' ) Classify an image import json from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( '2018-03-19', iam_apikey='{iam_api_key}') with open('./fruitbowl.jpg', 'rb') as images_file: classes = visual_recognition.classify( images_file, threshold='0.6', classifier_ids='DefaultCustomModel_967878440').get_result() print(json.dumps(classes, indent=2)) …………………………………………………………
  • 19. 9. Visual Recognition Client in Android Phone Details are given in the following link VR ///////////////////////////////////////////// ​document ends​ //////////////////////////////////////////////////////