SlideShare a Scribd company logo
QuTrack: A Blockchain-based approach to Model
Lifecycle Management
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy
sri@quantuniversity.com
QuantUniversity
2
• Model Life-cycle Management challenges
• Approaches
• QuTrack: A Blockchain-based approach to Model Lifecycle
Management
• Demo
Agenda
3
Machine Learning Workflow
Data Scraping/
Ingestion
Data
Exploration
Data Cleansing
and Processing
Feature
Engineering
Model
Evaluation
& Tuning
Model
Selection
Model
Deployment/
Inference
Supervised
Unsupervised
Modeling
Data Engineer, Dev Ops Engineer
Data Scientist/QuantsSoftware/Web Engineer
• AutoML
• Model Validation
• Interpretability
Robotic Process Automation (RPA) (Microservices, Pipelines )
• SW: Web/ Rest API
• HW: GPU, Cloud
• Monitoring
• Regression
• KNN
• Decision Trees
• Naive Bayes
• Neural Networks
• Ensembles
• Clustering
• PCA
• Autoencoder
• RMS
• MAPS
• MAE
• Confusion Matrix
• Precision/Recall
• ROC
• Hyper-parameter
tuning
• Parameter Grids
Risk Management/ Compliance(All stages)
Analysts&
DecisionMakers
5
• Developing ML applications involves:
▫ Heuristics
▫ Best practices (templates)
▫ Lots of experimentation
▫ Many moving pieces
▫ No “right” answer
▫ Error creep: Even small untracked errors can through off results
AI/ML application development => Design of Experiments
6
Source: Sculley et al., 2015 "Hidden Technical Debt in Machine Learning Systems"
Challenges
7
The reproducibility challenge
https://www.nature.com/news/1-500-scientists-lift-the-
lid-on-reproducibility-1.19970
8
• Repeatability (Same team, same experimental setup)
▫ The measurement can be obtained with stated precision by the same team using the
same measurement procedure, the same measuring system, under the same operating
conditions, in the same location on multiple trials. For computational experiments, this
means that a researcher can reliably repeat her own computation.
• Replicability (Different team, same experimental setup)
▫ The measurement can be obtained with stated precision by a different team using the
same measurement procedure, the same measuring system, under the same operating
conditions, in the same or a different location on multiple trials. For computational
experiments, this means that an independent group can obtain the same result using the
author’s own artifacts.
• Reproducibility (Different team, different experimental setup)
▫ The measurement can be obtained with stated precision by a different team, a different
measuring system, in a different location on multiple trials. For computational
experiments, this means that an independent group can obtain the same result using
artifacts which they develop completely independently.
Repeatable or Reproducible or Replicable
https://www.acm.org/publications/policies/artifact-review-badging
9
Many choices
Languages
Frameworks
Platforms
10
Elements of Model Risk Management
11
AI Governance is gaining focus
https://legalinstruments.oecd.org/
en/instruments/OECD-LEGAL-0449
12
AI Governance is gaining focus
https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
13
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API
14
• Programming environment
• Execution environment
• Hardware specs
• Cloud
• GPU
• Dependencies
• Lineage/Provenance of
individual components
• Model params
• Hyper parameters
• Pipeline specifications
• Model specific
• Tests
• Data versions
Data Model
EnvironmentProcess
Components that needs to be tracked
15
Source: T. van derWeide, O. Smirnov, M. Zielinski, D. Papadopoulos, and T. van Kasteren. Versioned machine learning pipelines for batch experimentation. In ML
Systems, Workshop NIPS 2016, 2016.
Provenance and Lineage of pipelines
16
Schemas proposed
Sebastian Schelter, Joos-Hendrik Boese, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. Automatically
Tracking Metadata and Provenance of Machine Learning Experiments. NIPS Workshop on Machine Learning
Systems, 2017.
17
Schemas proposed
G. C. Publio, D. Esteves, and H. Zafar, “ML-Schema : Exposing the Semantics of Machine Learning with Schemas
and Ontologies,” in Reproducibility in ML Workshop, ICML’18, 2018.
18
MLFlow
19
DVC
20
GoCD
21
22
I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the Kepler scientific workflow
system. In Provenance and annotation of data, pages 118–132.
Current approaches
23
Miao, Hui & Chavan, Amit & Deshpande, Amol. (2016). ProvDB: A System for Lifecycle Management of
Collaborative Analysis Workflows.
Current approaches
24
Related work
Xueping Liang, Sachin Shetty, Deepak Tosh, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. 2017. ProvChain: A Blockchain-based Data Provenance Architecture in Cloud
Environment with Enhanced Privacy and Availability. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press,
Piscataway, NJ, USA, 468-477. DOI: https://doi.org/10.1109/CCGRID.2017.8
Focus on Cloud data
provenance using
Blockchain
25
Related work
Ramachandran, Aravind & Kantarcioglu, Dr. (2017). Using Blockchain and smart contracts for secure data provenance management.
DataProv: Built on top of
Ethereum, the platform
utilizes smart contracts
and open provenance
model (OPM) to record
immutable data trails.
26
Related work
Sarpatwar, Kanthi & Vaculín, Roman & Min, Hong & Su, Gong & Heath, Terry & Ganapavarapu, Giridhar & Dillenberger, Donna. (2019). Towards Enabling Trusted Artificial
Intelligence via Blockchain. 10.1007/978-3-030-17277-0_8.
Trusted AI and
provenance of AI models
27
28
QuSandbox research suite
Model Analytics
Studio
QuSandboxQuTrack
QuResearchHub
Prototype, Iterate and tune
Standardize workflowsProductionize and share
Track experiments
29
The four components that need to be encapsulated for
reproducible pipelines
Code Data
Environment Process
30
QuSandbox
31
32
Model Management Studio
33
• JDF: Job Definition File; A DSL for representing Model Pipelines
• Stage
• Entity
▫ Model
▫ Data
▫ Environment
• Version format
▫ M:m:p -> Major Version: Minor Version: Patch
Terms
34
JDF- DSL
35
QuResearchHub
36
• Ability to track the evolution of experiments
• Snapshot and store the lineage of pipelines
• Ensure auditability and secure access to archived pipelines
• Track experiments and their associated parameters
• Track all aspects of experiments to ensure reproducibility
• For high-impact (regulated, critical) applications, ensure experiment
traces are immutable and verifiable
QuTrack Design requirements
37
Metadata
• Data about the information to be tracked
• Includes version number, timestamps, user information, MD5 of the artifacts
and high-level notes
Data
• Pipelines, custom DSL, standard formats for representing models
• Events (Updates, rollbacks
• JSON, Amazon ION, YAML,
Artifacts
• Model Pickle files, ONYX, COREML, Model params
• Data, blobs etc.
Architecture : What’s tracked ?
38
Blockchain-based:
• QLDB
• Ethereum
Non-Blockchain-based:
• MongoDB
Architectures supported
39
Amazon Quantum Ledger Database (QLDB)
• Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
• SQL-like API
• Cost-effective
Amazon QLDB
40
Amazon QLDB
41
Amazon QLDB
https://aws.amazon.com/qldb/faqs/
42
Demo: Reference App
43
• Support for ONYX, CoreML
• Integration with:
▫ MLFlow, DVC, GoCD
• Integration with SCM systems
▫ Github, SVM
• Tracking Back tests
• Push Architecture -> Event-Driven Architecture
• Enriched Analytics
• Roles and Authorization
Future work
Thank You!
If you are interested in trying
out QuSandbox,
Please sign up for updates at:
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
44

More Related Content

What's hot (20)

PPTX
Automatic Model Documentation with H2O
Sri Ambati
 
PDF
Adopting Data Science and Machine Learning in the financial enterprise
QuantUniversity
 
PDF
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
QuantUniversity
 
PDF
achine Learning and Model Risk
QuantUniversity
 
PDF
Latent Panelists Affinities: a Helixa case study
Gianmario Spacagna
 
PDF
Python for Data science
QuantUniversity
 
PDF
Nlp workshop-share
QuantUniversity
 
PDF
Ai in finance
QuantUniversity
 
PDF
QCon conference 2019
QuantUniversity
 
PDF
CFA-NY Workshop - Final slides
QuantUniversity
 
PDF
More thinking about xApi and IMS Caliper - Structural/Syntactic & Ontological...
Open Cyber University of Korea
 
PPTX
Thinking About Guideline for Data Interoperability - Design concept and workf...
Open Cyber University of Korea
 
PPTX
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
PDF
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Vishal Chowdhary
 
PDF
10 Key Considerations for AI/ML Model Governance
QuantUniversity
 
PDF
CD4ML and the challenges of testing and quality in ML systems
Seldon
 
PDF
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Benjamin Bengfort
 
PPTX
Intuit - Machine learning platform lifecycle management 2018
Karthik Murugesan
 
PDF
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 
Automatic Model Documentation with H2O
Sri Ambati
 
Adopting Data Science and Machine Learning in the financial enterprise
QuantUniversity
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
QuantUniversity
 
achine Learning and Model Risk
QuantUniversity
 
Latent Panelists Affinities: a Helixa case study
Gianmario Spacagna
 
Python for Data science
QuantUniversity
 
Nlp workshop-share
QuantUniversity
 
Ai in finance
QuantUniversity
 
QCon conference 2019
QuantUniversity
 
CFA-NY Workshop - Final slides
QuantUniversity
 
More thinking about xApi and IMS Caliper - Structural/Syntactic & Ontological...
Open Cyber University of Korea
 
Thinking About Guideline for Data Interoperability - Design concept and workf...
Open Cyber University of Korea
 
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Vishal Chowdhary
 
10 Key Considerations for AI/ML Model Governance
QuantUniversity
 
CD4ML and the challenges of testing and quality in ML systems
Seldon
 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Benjamin Bengfort
 
Intuit - Machine learning platform lifecycle management 2018
Karthik Murugesan
 
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Sri Ambati
 

Similar to QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain approach (20)

PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
PDF
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
PPTX
C4Bio paper talk
Paolo Missier
 
PPTX
GlobusWorld 2020 Keynote
Globus
 
PPT
Computing Outside The Box June 2009
Ian Foster
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PPTX
Scientific
marpierc
 
PPTX
Azure Databricks for Data Scientists
Richard Garris
 
PDF
GenePattern Integration with Globus
Globus
 
PPTX
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Paolo Missier
 
PDF
Automatic machine learning (AutoML) 101
QuantUniversity
 
PDF
Analytics&IoT
Selvaraj Kesavan
 
PPTX
ADDO Open Source Observability Tools
Mickey Boxell
 
PPTX
Serverless machine learning architectures at Helixa
Data Science Milan
 
PPT
Instrumentation and measurement
Dr.M.Prasad Naidu
 
PPTX
FAIR Computational Workflows
Carole Goble
 
PPTX
Advances in Scientific Workflow Environments
Carole Goble
 
PPTX
Neo4j GraphTalks Oslo - Graph Your Business - Rik Van Bruggen, Neo4j
Neo4j
 
PDF
Introduction to Mahout and Machine Learning
Varad Meru
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
C4Bio paper talk
Paolo Missier
 
GlobusWorld 2020 Keynote
Globus
 
Computing Outside The Box June 2009
Ian Foster
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
Scientific
marpierc
 
Azure Databricks for Data Scientists
Richard Garris
 
GenePattern Integration with Globus
Globus
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Paolo Missier
 
Automatic machine learning (AutoML) 101
QuantUniversity
 
Analytics&IoT
Selvaraj Kesavan
 
ADDO Open Source Observability Tools
Mickey Boxell
 
Serverless machine learning architectures at Helixa
Data Science Milan
 
Instrumentation and measurement
Dr.M.Prasad Naidu
 
FAIR Computational Workflows
Carole Goble
 
Advances in Scientific Workflow Environments
Carole Goble
 
Neo4j GraphTalks Oslo - Graph Your Business - Rik Van Bruggen, Neo4j
Neo4j
 
Introduction to Mahout and Machine Learning
Varad Meru
 
Ad

More from QuantUniversity (20)

PDF
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
PDF
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
PDF
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
PDF
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PDF
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
PDF
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
PDF
Ml master class for CFA Dallas
QuantUniversity
 
PDF
Algorithmic auditing 1.0
QuantUniversity
 
PDF
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
PDF
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
PDF
Seeing what a gan cannot generate: paper review
QuantUniversity
 
PDF
AI Explainability and Model Risk Management
QuantUniversity
 
PDF
Algorithmic auditing 1.0
QuantUniversity
 
PDF
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
PDF
Bayesian Portfolio Allocation
QuantUniversity
 
PDF
The API Jungle
QuantUniversity
 
PDF
Explainable AI Workshop
QuantUniversity
 
PDF
Constructing Private Asset Benchmarks
QuantUniversity
 
PDF
Machine Learning Interpretability
QuantUniversity
 
PDF
Responsible AI in Action
QuantUniversity
 
AI in Finance and Retirement Systems: Insights from the EBRI-Milken Institute...
QuantUniversity
 
Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitig...
QuantUniversity
 
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
QuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
QuantUniversity
 
Qu for India - QuantUniversity FundRaiser
QuantUniversity
 
Ml master class for CFA Dallas
QuantUniversity
 
Algorithmic auditing 1.0
QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
QuantUniversity
 
Seeing what a gan cannot generate: paper review
QuantUniversity
 
AI Explainability and Model Risk Management
QuantUniversity
 
Algorithmic auditing 1.0
QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 
Bayesian Portfolio Allocation
QuantUniversity
 
The API Jungle
QuantUniversity
 
Explainable AI Workshop
QuantUniversity
 
Constructing Private Asset Benchmarks
QuantUniversity
 
Machine Learning Interpretability
QuantUniversity
 
Responsible AI in Action
QuantUniversity
 
Ad

Recently uploaded (20)

PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain approach

  • 1. QuTrack: A Blockchain-based approach to Model Lifecycle Management 2019 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy [email protected] QuantUniversity
  • 2. 2 • Model Life-cycle Management challenges • Approaches • QuTrack: A Blockchain-based approach to Model Lifecycle Management • Demo Agenda
  • 3. 3
  • 4. Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer Data Scientist/QuantsSoftware/Web Engineer • AutoML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Analysts& DecisionMakers
  • 5. 5 • Developing ML applications involves: ▫ Heuristics ▫ Best practices (templates) ▫ Lots of experimentation ▫ Many moving pieces ▫ No “right” answer ▫ Error creep: Even small untracked errors can through off results AI/ML application development => Design of Experiments
  • 6. 6 Source: Sculley et al., 2015 "Hidden Technical Debt in Machine Learning Systems" Challenges
  • 8. 8 • Repeatability (Same team, same experimental setup) ▫ The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation. • Replicability (Different team, same experimental setup) ▫ The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts. • Reproducibility (Different team, different experimental setup) ▫ The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently. Repeatable or Reproducible or Replicable https://www.acm.org/publications/policies/artifact-review-badging
  • 10. 10 Elements of Model Risk Management
  • 11. 11 AI Governance is gaining focus https://legalinstruments.oecd.org/ en/instruments/OECD-LEGAL-0449
  • 12. 12 AI Governance is gaining focus https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
  • 13. 13 NLP pipeline Data Ingestion from Edgar Pre-Processing Invoking APIs to label data Compare APIs Build a new model for sentiment Analysis Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 • Amazon Comprehend API • Google API • Watson API • Azure API
  • 14. 14 • Programming environment • Execution environment • Hardware specs • Cloud • GPU • Dependencies • Lineage/Provenance of individual components • Model params • Hyper parameters • Pipeline specifications • Model specific • Tests • Data versions Data Model EnvironmentProcess Components that needs to be tracked
  • 15. 15 Source: T. van derWeide, O. Smirnov, M. Zielinski, D. Papadopoulos, and T. van Kasteren. Versioned machine learning pipelines for batch experimentation. In ML Systems, Workshop NIPS 2016, 2016. Provenance and Lineage of pipelines
  • 16. 16 Schemas proposed Sebastian Schelter, Joos-Hendrik Boese, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. Automatically Tracking Metadata and Provenance of Machine Learning Experiments. NIPS Workshop on Machine Learning Systems, 2017.
  • 17. 17 Schemas proposed G. C. Publio, D. Esteves, and H. Zafar, “ML-Schema : Exposing the Semantics of Machine Learning with Schemas and Ontologies,” in Reproducibility in ML Workshop, ICML’18, 2018.
  • 21. 21
  • 22. 22 I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the Kepler scientific workflow system. In Provenance and annotation of data, pages 118–132. Current approaches
  • 23. 23 Miao, Hui & Chavan, Amit & Deshpande, Amol. (2016). ProvDB: A System for Lifecycle Management of Collaborative Analysis Workflows. Current approaches
  • 24. 24 Related work Xueping Liang, Sachin Shetty, Deepak Tosh, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. 2017. ProvChain: A Blockchain-based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press, Piscataway, NJ, USA, 468-477. DOI: https://doi.org/10.1109/CCGRID.2017.8 Focus on Cloud data provenance using Blockchain
  • 25. 25 Related work Ramachandran, Aravind & Kantarcioglu, Dr. (2017). Using Blockchain and smart contracts for secure data provenance management. DataProv: Built on top of Ethereum, the platform utilizes smart contracts and open provenance model (OPM) to record immutable data trails.
  • 26. 26 Related work Sarpatwar, Kanthi & Vaculín, Roman & Min, Hong & Su, Gong & Heath, Terry & Ganapavarapu, Giridhar & Dillenberger, Donna. (2019). Towards Enabling Trusted Artificial Intelligence via Blockchain. 10.1007/978-3-030-17277-0_8. Trusted AI and provenance of AI models
  • 27. 27
  • 28. 28 QuSandbox research suite Model Analytics Studio QuSandboxQuTrack QuResearchHub Prototype, Iterate and tune Standardize workflowsProductionize and share Track experiments
  • 29. 29 The four components that need to be encapsulated for reproducible pipelines Code Data Environment Process
  • 31. 31
  • 33. 33 • JDF: Job Definition File; A DSL for representing Model Pipelines • Stage • Entity ▫ Model ▫ Data ▫ Environment • Version format ▫ M:m:p -> Major Version: Minor Version: Patch Terms
  • 36. 36 • Ability to track the evolution of experiments • Snapshot and store the lineage of pipelines • Ensure auditability and secure access to archived pipelines • Track experiments and their associated parameters • Track all aspects of experiments to ensure reproducibility • For high-impact (regulated, critical) applications, ensure experiment traces are immutable and verifiable QuTrack Design requirements
  • 37. 37 Metadata • Data about the information to be tracked • Includes version number, timestamps, user information, MD5 of the artifacts and high-level notes Data • Pipelines, custom DSL, standard formats for representing models • Events (Updates, rollbacks • JSON, Amazon ION, YAML, Artifacts • Model Pickle files, ONYX, COREML, Model params • Data, blobs etc. Architecture : What’s tracked ?
  • 39. 39 Amazon Quantum Ledger Database (QLDB) • Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log. • SQL-like API • Cost-effective Amazon QLDB
  • 43. 43 • Support for ONYX, CoreML • Integration with: ▫ MLFlow, DVC, GoCD • Integration with SCM systems ▫ Github, SVM • Tracking Back tests • Push Architecture -> Event-Driven Architecture • Enriched Analytics • Roles and Authorization Future work
  • 44. Thank You! If you are interested in trying out QuSandbox, Please sign up for updates at: www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 44