Designing a production-grade
Realtime ML Inference Endpoint
-> Scope of our application
-> Exploring the functionalities[Demo]
-> Project structure and components
-> Discussing Common Project Essentials
-> Project and Credentials Configurations Format
-> Deep dive in the workflow
-> I/O Format of the Inference Endpoint
-> Packaging and Running the application
Scope of our application
• A Python Flask server serving predictions from trained serialized machine learning
models based on a micro-service architecture.
• Requests containing testing feature data.
• A Cloud native application build with Docker and compatible with orchestration
systems like Docker Swarm, AWS ECS, Kubernetes.
Model Selection
Linear Regression
delay = m X {invoice_amt} + c
Model Training
delay= 0.01X {invoice_amt} + 0
Invoice amount delay
100
200
300
400
1
2
3
4
Save Model
Serialization
Y=mx+c; m=0.01,c=0
Load Model
Deserialization
Y=0.01x+0
Predict:: Y:delay
X: invoice_amt
Fig 1: An Example of an ML Pipeline
Exploring the functionalities
• It is able to handle parallel requests for performing predictions in python
real time.
• It supports and can extend various storage options like AWS S3, Azure Blob,
NAS for saving or loading the model artifacts.
• Can implement any algorithm in python which can serve the predictions
within synchronous time duration of a HTTP connection.
Fig 2: ML Inference Endpoint
Project structure and components
• The project source code contains the actual code.
• Configs folder contains various project level configurations like logging levels.
• The docs folder contains documentation created using sphinx tool as shown:
sphinx-build <docs-source-path> <docs-html-path>
• The tests folder contain all the test cases related to the project. The test cases
are written using pytest module. Following command asserts the testcases:
python -m pytest -v –s
• The dist folder contains the binary distribution of the project containing the
wheel file. It can be used to resolve dependencies. Following command shows
how to upload a wheel file in a repository using twine. twine upload --repository-url
<repo-url> -u <user> -p <pass> <whl file>.
• The setup.py is a module which is used to create the wheel file using the
following command: python setup.py sdist bdist_wheel. The dependent package can
be enlisted in the requirements.txt file and can be installed via pip with an
additional url index of the repo. pip install -r requirements.txt --index-url <repo-url>.
• The configurations.py initializes the credentials config and the project
configurations according to the SDLC environment it is running in.
common_utils
real_time_prediction_
engine
Fig 4: Project Dependencies
Fig 3: Project Structure
Discussing common_utils Project Essentials
AzureUtil
S3Util
NASUtil
Get/Put/Check files Azure
Storage UtilsService Helper Functions
Get/Put/Check files S3
Get/Put/Check files NAS
Service Helper Factory
Get/Put/Check files
Project Configurations
Local Develop Testing Production
Resolve
SDLC Env
MLLogging(Singleton)
Credentials
Configurations
Load Secrets
Fig 6: File Handling abstraction in common_utils
Fig 5: Components for Configuration Setup of Project and Credentials
Project and Credentials Configurations Format
[S3]
S3_local_folder_path =
/ml_webservice/storage/
[Azure]
Azure_local_folder_path =
/ml_webservice/storage/
storage_account_name=asc
[NAS]
NAS_local_folder_path =
/ml_webservice/storage/
[Logs]
log_file_path = /ml_webservice/logs/
log_file_prefix = ml_webservice_log
log_level_file = DEBUG
log_level_console = DEBUG
logger_default_name =
DataScienceLogging
logger_type= FileHandler
log_time_rotator = d
log_time_interval = 1
logging_channel=File
[AWS Credentials]
access_key=ABCDEFGH
secret_key=XXXXXXXXX
[Azure Credentials]
account_key=xxxxxxxxxxxxxxx
Fig 7: Project Configurations
Fig 8: Credentials Configurations
Deep dive in the workflow
Fig 9: Workflow for inferencing predictions
Fig 10: Modules involved for inferencing predictions
I/O Format of the Inference Endpoint
Input Format:
{'payload': [{'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2}, {'sepal_length':
2.1, 'sepal_width': 5.5, 'petal_length': 6.4, 'petal_width': 0.2}], 'pipelineFilename': '/s3-file-
storage/logistic_regression/iris.pickle', 'implementation': 'Logistic Regresssion’}
Output Format:
{"predictions": [0, 2], "predictions_prob": [[0.8796816489561705, 0.1203075379066039,
1.0813137225507556e-05], [0.004065518685680241, 0.031801728404804656, 0.9641327529095151]],
"classes": [0, 1, 2]}
Fig 11:
Input
Format
Fig 12:
Output
Format
Packaging and Running the application
version: '3'
services:
inference-engine:
build: ./
image: python-prediction-engine
container_name: predictionServer
environment:
- DEPLOY_ENV=developement
- HOST=0.0.0.0
- PORT=8080
ports:
- 8080:8080
volumes:
- E:/ml_resources/storage/:/ml_webservice/storage/
- E:/ml_resources/logs/:/ml_webservice/logs/
- E:/ml_resources/credentials.properties:/ml_webservice/prediction_engine/configs/credentials.properties
FROM python:3.6
WORKDIR /ml_/prediction_engine/
COPY requirementwebservices.txt /ml_webservice/prediction_engine/
RUN pip install -r requirements.txt --index-url https://pypi.org/simple/ --
extra-index-url http://192.168.0.104:8081/repository/$env/simple --trusted-host
192.168.0.104
COPY prediction_engine /ml_webservice/prediction_engine/
EXPOSE 8080
ENTRYPOINT python3.6 ./app.py
Inside Root of the project:
#Build the Image
> docker-compose build
#Run the Image
> docker-compose up
Fig 13: Dockerfile for building image of real_time_prediction_engine
Fig 15: Docker-compose for building image of real_time_prediction_engine
Fig 14: Running the
application
Questions?
Realtime ML Inference Endpoint is a synchronous mechanism for processing predictions.
For Asynchronous pipelines for processing computationally intensive data we have to adapt
to worker based executions.
An MLaaS is implemented for both synchronous and asynchronous pipelines using the same
architecture.
Code can be found on:
https://github.com/chandimsett/MLEngine/tree/master/real_time_prediction_engine
A compiled version and in-depth work can be found in the book Data Sciences For
Enterprises: Deployment and Beyond
https://www.amazon.in/Data-Science-Enterprises-Deployment-beyond/dp/9352673352

More Related Content

PDF
Sprint 43 Review
PDF
Sprint 44 review
PDF
Sprint 45 review
PPTX
Investigative Debugging - Peter McGowan - ManageIQ Design Summit 2016
PDF
Sprint 39 review
PDF
Google App Engine Developer - Day4
PDF
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
PDF
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...
Sprint 43 Review
Sprint 44 review
Sprint 45 review
Investigative Debugging - Peter McGowan - ManageIQ Design Summit 2016
Sprint 39 review
Google App Engine Developer - Day4
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert...

What's hot (20)

PDF
ClickHouse Monitoring 101: What to monitor and how
PDF
Embuk internals
PDF
Sprint 46 review
PDF
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
ODP
BaseX user-group-talk XML Prague 2013
PPTX
Kubeflow on google kubernetes engine
PDF
Automating Workflows for Analytics Pipelines
PDF
Creating Beautiful Dashboards with Grafana and ClickHouse
PDF
Sprint 49 review
PDF
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
PDF
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
PDF
Making KVS 10x Scalable
PDF
Build a Complex, Realtime Data Management App with Postgres 14!
PDF
GCPUG meetup 201610 - Dataflow Introduction
PDF
OpenStack API's and WSGI
PDF
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
PDF
Easy deployment & management of cloud apps
PDF
Embulk - 進化するバルクデータローダ
PDF
Sprint 48 review
PDF
Sprint 72
ClickHouse Monitoring 101: What to monitor and how
Embuk internals
Sprint 46 review
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
BaseX user-group-talk XML Prague 2013
Kubeflow on google kubernetes engine
Automating Workflows for Analytics Pipelines
Creating Beautiful Dashboards with Grafana and ClickHouse
Sprint 49 review
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Making KVS 10x Scalable
Build a Complex, Realtime Data Management App with Postgres 14!
GCPUG meetup 201610 - Dataflow Introduction
OpenStack API's and WSGI
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Easy deployment & management of cloud apps
Embulk - 進化するバルクデータローダ
Sprint 48 review
Sprint 72
Ad

Similar to Designing a production grade realtime ml inference endpoint (20)

PDF
Applied Machine learning for business analytics
PDF
Utilisation de MLflow pour le cycle de vie des projet Machine learning
PDF
Journey through the ML model deployment to production @DSC5
PDF
A journey through the machine learning model deployment to production
PDF
Reproducible AI Using PyTorch and MLflow
PPTX
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
PDF
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
PDF
Deploying Machine Learning Models to Production
PDF
Journey through the ML model deployment to production by Stanko Kuveljic
PDF
Strata parallel m-ml-ops_sept_2017
PDF
PDF
Tech leaders guide to effective building of machine learning products
PDF
「リモートペアプロでマントルを突き抜けろ!」AWS Cloud9でリモートペアプロ&楽々サーバーレス開発
PDF
Python Linters at Scale.pdf
PDF
MLOps pipelines using MLFlow - From training to production
DOCX
Pycon2015 scope
PDF
MLSEV Virtual. From my First BigML Project to Production
PDF
Productionizing Real-time Serving With MLflow
PDF
Using_python_webdevolopment_datascience.pdf
PPTX
Why is dev ops for machine learning so different
Applied Machine learning for business analytics
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Journey through the ML model deployment to production @DSC5
A journey through the machine learning model deployment to production
Reproducible AI Using PyTorch and MLflow
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Deploying Machine Learning Models to Production
Journey through the ML model deployment to production by Stanko Kuveljic
Strata parallel m-ml-ops_sept_2017
Tech leaders guide to effective building of machine learning products
「リモートペアプロでマントルを突き抜けろ!」AWS Cloud9でリモートペアプロ&楽々サーバーレス開発
Python Linters at Scale.pdf
MLOps pipelines using MLFlow - From training to production
Pycon2015 scope
MLSEV Virtual. From my First BigML Project to Production
Productionizing Real-time Serving With MLflow
Using_python_webdevolopment_datascience.pdf
Why is dev ops for machine learning so different
Ad

Recently uploaded (20)

PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PPTX
Internet of Everything -Basic concepts details
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPT
Geologic Time for studying geology for geologist
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
CloudStack 4.21: First Look Webinar slides
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Internet of Everything -Basic concepts details
UiPath Agentic Automation session 1: RPA to Agents
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
4 layer Arch & Reference Arch of IoT.pdf
sbt 2.0: go big (Scala Days 2025 edition)
The influence of sentiment analysis in enhancing early warning system model f...
Convolutional neural network based encoder-decoder for efficient real-time ob...
A review of recent deep learning applications in wood surface defect identifi...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Geologic Time for studying geology for geologist
Basics of Cloud Computing - Cloud Ecosystem
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
CloudStack 4.21: First Look Webinar slides
sustainability-14-14877-v2.pddhzftheheeeee
Consumable AI The What, Why & How for Small Teams.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
Custom Battery Pack Design Considerations for Performance and Safety
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf

Designing a production grade realtime ml inference endpoint

  • 1. Designing a production-grade Realtime ML Inference Endpoint -> Scope of our application -> Exploring the functionalities[Demo] -> Project structure and components -> Discussing Common Project Essentials -> Project and Credentials Configurations Format -> Deep dive in the workflow -> I/O Format of the Inference Endpoint -> Packaging and Running the application
  • 2. Scope of our application • A Python Flask server serving predictions from trained serialized machine learning models based on a micro-service architecture. • Requests containing testing feature data. • A Cloud native application build with Docker and compatible with orchestration systems like Docker Swarm, AWS ECS, Kubernetes. Model Selection Linear Regression delay = m X {invoice_amt} + c Model Training delay= 0.01X {invoice_amt} + 0 Invoice amount delay 100 200 300 400 1 2 3 4 Save Model Serialization Y=mx+c; m=0.01,c=0 Load Model Deserialization Y=0.01x+0 Predict:: Y:delay X: invoice_amt Fig 1: An Example of an ML Pipeline
  • 3. Exploring the functionalities • It is able to handle parallel requests for performing predictions in python real time. • It supports and can extend various storage options like AWS S3, Azure Blob, NAS for saving or loading the model artifacts. • Can implement any algorithm in python which can serve the predictions within synchronous time duration of a HTTP connection. Fig 2: ML Inference Endpoint
  • 4. Project structure and components • The project source code contains the actual code. • Configs folder contains various project level configurations like logging levels. • The docs folder contains documentation created using sphinx tool as shown: sphinx-build <docs-source-path> <docs-html-path> • The tests folder contain all the test cases related to the project. The test cases are written using pytest module. Following command asserts the testcases: python -m pytest -v –s • The dist folder contains the binary distribution of the project containing the wheel file. It can be used to resolve dependencies. Following command shows how to upload a wheel file in a repository using twine. twine upload --repository-url <repo-url> -u <user> -p <pass> <whl file>. • The setup.py is a module which is used to create the wheel file using the following command: python setup.py sdist bdist_wheel. The dependent package can be enlisted in the requirements.txt file and can be installed via pip with an additional url index of the repo. pip install -r requirements.txt --index-url <repo-url>. • The configurations.py initializes the credentials config and the project configurations according to the SDLC environment it is running in. common_utils real_time_prediction_ engine Fig 4: Project Dependencies Fig 3: Project Structure
  • 5. Discussing common_utils Project Essentials AzureUtil S3Util NASUtil Get/Put/Check files Azure Storage UtilsService Helper Functions Get/Put/Check files S3 Get/Put/Check files NAS Service Helper Factory Get/Put/Check files Project Configurations Local Develop Testing Production Resolve SDLC Env MLLogging(Singleton) Credentials Configurations Load Secrets Fig 6: File Handling abstraction in common_utils Fig 5: Components for Configuration Setup of Project and Credentials
  • 6. Project and Credentials Configurations Format [S3] S3_local_folder_path = /ml_webservice/storage/ [Azure] Azure_local_folder_path = /ml_webservice/storage/ storage_account_name=asc [NAS] NAS_local_folder_path = /ml_webservice/storage/ [Logs] log_file_path = /ml_webservice/logs/ log_file_prefix = ml_webservice_log log_level_file = DEBUG log_level_console = DEBUG logger_default_name = DataScienceLogging logger_type= FileHandler log_time_rotator = d log_time_interval = 1 logging_channel=File [AWS Credentials] access_key=ABCDEFGH secret_key=XXXXXXXXX [Azure Credentials] account_key=xxxxxxxxxxxxxxx Fig 7: Project Configurations Fig 8: Credentials Configurations
  • 7. Deep dive in the workflow Fig 9: Workflow for inferencing predictions Fig 10: Modules involved for inferencing predictions
  • 8. I/O Format of the Inference Endpoint Input Format: {'payload': [{'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2}, {'sepal_length': 2.1, 'sepal_width': 5.5, 'petal_length': 6.4, 'petal_width': 0.2}], 'pipelineFilename': '/s3-file- storage/logistic_regression/iris.pickle', 'implementation': 'Logistic Regresssion’} Output Format: {"predictions": [0, 2], "predictions_prob": [[0.8796816489561705, 0.1203075379066039, 1.0813137225507556e-05], [0.004065518685680241, 0.031801728404804656, 0.9641327529095151]], "classes": [0, 1, 2]} Fig 11: Input Format Fig 12: Output Format
  • 9. Packaging and Running the application version: '3' services: inference-engine: build: ./ image: python-prediction-engine container_name: predictionServer environment: - DEPLOY_ENV=developement - HOST=0.0.0.0 - PORT=8080 ports: - 8080:8080 volumes: - E:/ml_resources/storage/:/ml_webservice/storage/ - E:/ml_resources/logs/:/ml_webservice/logs/ - E:/ml_resources/credentials.properties:/ml_webservice/prediction_engine/configs/credentials.properties FROM python:3.6 WORKDIR /ml_/prediction_engine/ COPY requirementwebservices.txt /ml_webservice/prediction_engine/ RUN pip install -r requirements.txt --index-url https://pypi.org/simple/ -- extra-index-url http://192.168.0.104:8081/repository/$env/simple --trusted-host 192.168.0.104 COPY prediction_engine /ml_webservice/prediction_engine/ EXPOSE 8080 ENTRYPOINT python3.6 ./app.py Inside Root of the project: #Build the Image > docker-compose build #Run the Image > docker-compose up Fig 13: Dockerfile for building image of real_time_prediction_engine Fig 15: Docker-compose for building image of real_time_prediction_engine Fig 14: Running the application
  • 10. Questions? Realtime ML Inference Endpoint is a synchronous mechanism for processing predictions. For Asynchronous pipelines for processing computationally intensive data we have to adapt to worker based executions. An MLaaS is implemented for both synchronous and asynchronous pipelines using the same architecture. Code can be found on: https://github.com/chandimsett/MLEngine/tree/master/real_time_prediction_engine A compiled version and in-depth work can be found in the book Data Sciences For Enterprises: Deployment and Beyond https://www.amazon.in/Data-Science-Enterprises-Deployment-beyond/dp/9352673352