SlideShare a Scribd company logo
NVFLARE - NVIDIA FEDERATED LEARNING APPLICATION
RUNTIME ENVIRONMENT
Holger Roth (hroth@nvidia.com)| March 2022
BUILDING ROBUST, GENERALIZABLE AI MODELS IS HARD
DATA PRIVACY
Patient Privacy | Data Governance
DATA PREP
Expert Knowledge | Time Consuming
DATA DIVERSITY
Rare Diseases | Quantity 101-103
BUILDING AI FOR REAL-WORLD CLINICAL PERFORMANCE
Taking Algorithms Beyond Proof-of-Concept
REAL-WORLD AI DESIGN
Model to Data | Generalize Model
External Validation, Multiple Institutions, Prospective Data
FEDERATED LEARNING PARADIGM
Global Model
w
Only 6% of published AI studies have external validation
Few included multiple institutions
Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for
Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean J Radiol. 2019 Mar;20(3):405-410. doi: 10.3348/kjr.2019.0025.
PMID: 30799571; PMCID: PMC6389801.
Transfer Learning
“Adapt”
Federated Learning
“Generalize”
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
IMAGE TITLE GOES HERE
Optional Subtitle
MELLODDY
Multi-task Learning Chemical Assays
ERASMUS GENNET
Genome Wide Association Study
EDRN
Early Detection of Pancreatic Cancer
U MINNESOTA, FAIRVIEW
X-RAY Covid-19 Classification
EXAM
COVID-19 Oxygen Requirement
Prediction
FEDERATED LEARNING MOMENTUM
NVIDIA FLARE
§ Apache License 2.0 to catalyze FL research & development
§ Enables Distributed, Multi-Party Collaborative Learning
§ Adapt existing ML/DL workflows to a Federated paradigm
§ Privacy Preserving Algorithms
§ Homomorphic Encryption & Differential Privacy
§ Secure Provisioning, Orchestration & Monitoring
§ Programmable APIs for Extensibility
§ Available on GitHub: https://github.com/nvidia/nvFlare
Open-Source SDK for Federated Learning
GPU
CPU MULTI-GPU
NVIDIA FLARE
Federated Specification
Training
Flows
Evaluation
Flows
Learning
Algorithms
Privacy Preserving
Algorithms
Management Tools
Learner Confiiguration
Authenticate
Train
Evaluate
Model Updates
NVIDIA FLARE Runtime
Provisioning Orchestration Monitoring
NVFLARE
Key Design Principles
§ Research friendly
• Ease of experiment (including multi-site)
• Flexible for innovation and extension
• Support popular ML/DL frameworks
• Application domain agnostic
§ Applicable to real world scenarios
• Security and privacy
• System failures and unresponsive sites
• Imperfect datasets
NVIDIA FLARE Key Features
https://developer.nvidia.com/flare
LEARNING ALGORITHMS
Adaptive Federated
Optimization
(FedOpt)
Federated Averaging
(FedAvg)
Federated Proxy
(FedProx)
https://developer.nvidia.com/blog/creating-robust-and-generalizable-ai-models-with-nvidia-flare/
Cyclic Weight
Transfer
McMahan et al. Li et al. Reddi et al. Chang et al.
• Weighted average to update
global model
• Clients add a loss to stay
close the global model.
• Avoids models drifting away
from global model in
heterogenous datasets.
• Global model is updated
using an optimizer (SGD w.
momentum, Adam, Yogi,
Adagrad, etc.)
• Models are continuously
fine-tuned and circulated
around institutions
More to come...
Algorithms can be extended
• Differential privacy
• Homomorphic Encryption
SCAFFOLD Ditto
Karimireddy et al. Li et al.
• Adds correction terms
during training to deal with
non-IID
• Fairness through
personalization
NVIDIA FLARE ADDRESSES FEDERATED LEARNING PHASES
DATA
PREPARATION
PROVISION &
AUTHENTICATE
FEDERATED
PROGRAM
RECIPE
EDGE
COLLABORATOR
CONFIGURATIONS
MONITOR &
MANAGEMENT
SOLUTION
Provisioning startup kits
SSL authentication
authorization policies to
control access
Federated workflows
Scatter-Gather, Cyclic,
Eval, etc.
Support any framework
(TensorFlow, PyTorch,
RAPIDS, Nemo etc.)
Privacy preserving
Federated data
preparation/curation
workflows
Auxiliary APIs
Monitoring, visualization
NVIDIA FLARE v2.0
High-level Architecture
Server
Client
gRPC
gRPC
g
R
P
C
Provision
Tool
TCP
Admin
Provision
P
r
o
v
i
s
i
o
n
Provision
Client
Client … …
NVIDIA FLARE v2.0
High-level Architecture
Server
Provision
Tool
Admin
Provision
P
r
o
v
i
s
i
o
n
Provision
Client Client
Client
… …
NVIDIA FLARE v2.0
High-level Architecture
Server
Client
gRPC
gRPC
g
R
P
C
TCP
Admin
Client
Client … …
NVIDIA FLARE v2.0
High-level Architecture
Server
Client
gRPC
gRPC
g
R
P
C
Provision
Tool
TCP
Admin
Provision
P
r
o
v
i
s
i
o
n
Provision
Client
Client
API
API
API API
… …
NVIDIA FLARE APIs
Componentized Architecture
§ Open Provision API
§ Defines overall project configuration and generates mutually-trusted configuration
packages for server and clients using Provisioner and Builder modules.
§ Server Controller API
§ The Controller is a python object that defines the global Federated Learning control flow
via Tasks and Events.
§ Client Worker API
§ The Worker API is used to define Executors that perform Tasks orchestrated by the Server
Controller API
§ Admin API
§ The Admin API provides a means to control the Federated Learning System and allows
application developers to manage operation via external interfaces (e.g., Web UI).
… …
Controller and Worker API
Federated Workflows
FL Client
Worker
FL Server
Controller
Assign Task
Submit Task Result
Filter Task Data Filter Task Data
Filter Task Result Filter Task Result
Execute
Task
The Controller and Worker APIs define the overall
control flow via Events, Tasks, and Executors.
§ Inspired by HPC (OpenMPI)
§ The Controller defines the series of Tasks to be
executed by Workers and determines how these Tasks
are distributed (broadcast, cyclic, send).
§ The Worker implements Executors that execute
specific named Tasks as defined and distributed by
the Controller.
§ The Controller aggregates the Workers’ Task Result
as defined in the Controller workflow.
Filters can be used in both the Controller and
Executor Task Data and Task Results.
SCATTER-GATHER CONTROLLER FOR MODEL TRAINING
Typical workflow for FedAvg, FedOpt, FedProx, etc.
Scatter-Gather
Global Model
w
1. Server initializes model
2. For number of rounds:
1. Server broadcasts global model to workers
2. Workers validate global model and train on their
data
3. Workers keep track on their locally best model
(Personalization)
4. Workers send back updated model or updates
5. Server Gathers (Aggregates) updates and updates the
global model
Source: https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/scatter_and_gather.py
CONTROLLERS FOR MODEL EVALUATION
Global model evaluation, Cross-site model evaluation
Global Model
w
FedEval (Global Model Validation/Cross-Site Validation)
1. Server sends models (e.g. global model and
registered best local models) to each worker for
evaluation
2. Server gathers the resulting metrics
Metrics
Metrics Metrics
Best local models
Site-1​ Site-2​ ...​ Site-N​
Global​ (Final) … … … …
Global (Best)​ … … … …
Site-1​ … … … …
Site-2​ … … … …
...​ … … … …
Site-N​ … … … …
Models
Evaluation sites
Metrics
Source: https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/global_model_eval.py
https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/cross_site_model_eval.py
PYTHON ADMIN CLIENT
Interactive control of federated experiments
Docs: https://nvidia.github.io/NVFlare/user_guide/admin_commands.html
Server:
Client-1:
Client-2:
Admin client console:
PYTHON ADMIN API
Automate Running FL experiments
Initialization
Initialize the API with actual values for the FL setup: host, port,
paths to files and directories.
A provisioned admin package should have ca_cert, client_cert, and
client_key in the startup folder, and transfer can be created at
the same level as startup.
Log in with the admin name that corresponds to the provisioned
package.
After using FLAdminAPI, the logout() function can be called to log
out. Both login() and logout() are inherited from AdminAPI.
Usage
Simplest sequence to upload, deploy, and start training with the
“hello-pt” example app:
Contents of the returned FLAdminAPIResponse can be accessed:
Example: https://github.com/NVIDIA/NVFlare/blob/main/examples/cifar10/run_fl.py
INTRO EXAMPLES
Github repo has multiple examples
§ Hello-numpy
§ Hello-numpy-cross-val
§ Hello-tf2
§ Hello-pt
§ Hello-monai
§ CIFAR-10
§ Prostate
§ BraTS
END-TO-END EXAMPLES (CIFAR10, BRATS18, PROSTATE)
§ Comprehensive example for researchers to compare algorithms
1. Set up a virtual environment
2. Create your FL workspace
3. Run automated experiments
1. Varying data heterogeneity of data splits
2. Centralized training
3. FedAvg on different data splits
4. Advanced FL algorithms (FedProx and FedOpt)
5. Secure aggregation using homomorphic encryption
6. Differential privacy
4. Results
CROSS-SITE VALIDATION AND GLOBAL MODEL EVALUATION
Performance of locally best models (selected by best validation score on local data) using
(a) local training data alone and (b) after federated learning.
Source: Federated Learning for Breast Density Classification: A Real-World Implementation
FEDERATED ANALYSIS
Gather summary statistics
Example
§ Compute the local intensity histograms of each
client’s data
§ k-anonymity (e.g., at least 10 images)
§ Can be enhanced with differential privacy
§ Compute a global histogram
§ Result is accessible to admin on the server
SECURITY & PRIVACY
Homomorphic Encryption & Differential Privacy
Differential Privacy for BraTS18 Segmentation
Example: https://github.com/NVIDIA/NVFlare/tree/main/examples/brats18
validation Dice scores of the global model for 600 training epochs:
Federated Learning with Homomorphic Encryption
Blog: https://developer.nvidia.com/blog/federated-learning-with-
homomorphic-encryption/
Example: https://github.com/NVIDIA/NVFlare/tree/main/examples/cifar10
NVIDIA FLARE at GTC
Please join us Monday 3/21 for FLARE Dev Day at GTC [SE1991]
Thank You!
Contact: hroth@nvidia.com

More Related Content

What's hot (20)

PPTX
Introduction to Machine Learning
Sujith Jayaprakash
 
PDF
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
Edureka!
 
PPTX
Relationship Between Big Data & AI
Maruf Abdullah (Rion)
 
PDF
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
PDF
Large Language Models Bootcamp
Data Science Dojo
 
PPTX
Introduction of Data Science
Jason Geng
 
PDF
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
SlideTeam
 
PDF
The Future of Data Science
DataWorks Summit
 
PPTX
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
PPTX
Interpretable machine learning
Sri Ambati
 
PDF
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 
PPTX
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
PPTX
Generative AI Risks & Concerns
Ajitesh Kumar
 
PPT
MachineLearning.ppt
butest
 
PDF
What’s New with Databricks Machine Learning
Databricks
 
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
PPTX
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
PPTX
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
PPTX
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
 
PPTX
Machine Learning
Girish Khanzode
 
Introduction to Machine Learning
Sujith Jayaprakash
 
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
Edureka!
 
Relationship Between Big Data & AI
Maruf Abdullah (Rion)
 
Advanced Retrieval Augmented Generation Techniques
Zilliz
 
Large Language Models Bootcamp
Data Science Dojo
 
Introduction of Data Science
Jason Geng
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
SlideTeam
 
The Future of Data Science
DataWorks Summit
 
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Interpretable machine learning
Sri Ambati
 
Big data landscape v 3.0 - Matt Turck (FirstMark)
Matt Turck
 
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Generative AI Risks & Concerns
Ajitesh Kumar
 
MachineLearning.ppt
butest
 
What’s New with Databricks Machine Learning
Databricks
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
James Serra
 
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
 
Machine Learning
Girish Khanzode
 

Similar to SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime Environment for Developing Robust AI Models (20)

PDF
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
PPTX
Final year Project - ONLINE STUDY GROUP
Alifahyusli
 
PPTX
Multi Layer Federated Learning.pptx
TimePass43152
 
PDF
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
PDF
Backstage Software Templates for Java Developers
Markus Eisele
 
PDF
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
PPTX
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
PPTX
Securing your Machine Learning models
PhilipBasford
 
PDF
Clone of an organization
IRJET Journal
 
PPTX
Intership(Hadoop cluster and DevOps.pptx
jeevankenchanagoudar
 
PPTX
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
PDF
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
PDF
WebSphere Technical University: Introduction to the Java Diagnostic Tools
Chris Bailey
 
PDF
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
Robert Grossman
 
PPTX
Why is dev ops for machine learning so different
Ryan Dawson
 
PPTX
Modern Application Development v1-0
Greg Hoelzer
 
PPTX
Why is dev ops for machine learning so different - dataxdays
Ryan Dawson
 
PDF
Tuning the Untunable - Insights on Deep Learning Optimization
SigOpt
 
PDF
Impact2014: Introduction to the IBM Java Tools
Chris Bailey
 
DOC
Jaya Sindhura_Resume_Datastage
Sindhura Reddy
 
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Final year Project - ONLINE STUDY GROUP
Alifahyusli
 
Multi Layer Federated Learning.pptx
TimePass43152
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Backstage Software Templates for Java Developers
Markus Eisele
 
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
Securing your Machine Learning models
PhilipBasford
 
Clone of an organization
IRJET Journal
 
Intership(Hadoop cluster and DevOps.pptx
jeevankenchanagoudar
 
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
WebSphere Technical University: Introduction to the Java Diagnostic Tools
Chris Bailey
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
Robert Grossman
 
Why is dev ops for machine learning so different
Ryan Dawson
 
Modern Application Development v1-0
Greg Hoelzer
 
Why is dev ops for machine learning so different - dataxdays
Ryan Dawson
 
Tuning the Untunable - Insights on Deep Learning Optimization
SigOpt
 
Impact2014: Introduction to the IBM Java Tools
Chris Bailey
 
Jaya Sindhura_Resume_Datastage
Sindhura Reddy
 
Ad

More from Chester Chen (20)

PDF
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
PDF
zookeeer+raft-2.pdf
Chester Chen
 
PPTX
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
PDF
A missing link in the ML infrastructure stack?
Chester Chen
 
PDF
Shopify datadiscoverysf bigdata
Chester Chen
 
PDF
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
PDF
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
Chester Chen
 
PDF
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
PDF
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
Chester Chen
 
PPTX
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
PDF
SFBigAnalytics- hybrid data management using cdap
Chester Chen
 
PDF
Sf big analytics: bighead
Chester Chen
 
PPTX
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
PPTX
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
PPTX
2018 data warehouse features in spark
Chester Chen
 
PDF
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
PPTX
2018 02 20-jeg_index
Chester Chen
 
PDF
Index conf sparkml-feb20-n-pentreath
Chester Chen
 
PDF
Index conf sparkai-feb20-n-pentreath
Chester Chen
 
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
zookeeer+raft-2.pdf
Chester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
A missing link in the ML infrastructure stack?
Chester Chen
 
Shopify datadiscoverysf bigdata
Chester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
Chester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
SFBigAnalytics- hybrid data management using cdap
Chester Chen
 
Sf big analytics: bighead
Chester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
2018 data warehouse features in spark
Chester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
2018 02 20-jeg_index
Chester Chen
 
Index conf sparkml-feb20-n-pentreath
Chester Chen
 
Index conf sparkai-feb20-n-pentreath
Chester Chen
 
Ad

Recently uploaded (20)

PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PPT
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PDF
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PDF
Digital water marking system project report
Kamal Acharya
 
PPTX
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PPTX
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PPTX
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PDF
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
New_school_Engineering_presentation_011707.ppt
VinayKumar304579
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
Digital water marking system project report
Kamal Acharya
 
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
MODULE 04 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
Design Thinking basics for Engineers.pdf
CMR University
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
MODULE-5 notes [BCG402-CG&V] PART-B.pdf
Alvas Institute of Engineering and technology, Moodabidri
 

SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime Environment for Developing Robust AI Models

  • 1. NVFLARE - NVIDIA FEDERATED LEARNING APPLICATION RUNTIME ENVIRONMENT Holger Roth ([email protected])| March 2022
  • 2. BUILDING ROBUST, GENERALIZABLE AI MODELS IS HARD DATA PRIVACY Patient Privacy | Data Governance DATA PREP Expert Knowledge | Time Consuming DATA DIVERSITY Rare Diseases | Quantity 101-103
  • 3. BUILDING AI FOR REAL-WORLD CLINICAL PERFORMANCE Taking Algorithms Beyond Proof-of-Concept REAL-WORLD AI DESIGN Model to Data | Generalize Model External Validation, Multiple Institutions, Prospective Data FEDERATED LEARNING PARADIGM Global Model w Only 6% of published AI studies have external validation Few included multiple institutions Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean J Radiol. 2019 Mar;20(3):405-410. doi: 10.3348/kjr.2019.0025. PMID: 30799571; PMCID: PMC6389801. Transfer Learning “Adapt” Federated Learning “Generalize”
  • 4. IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle IMAGE TITLE GOES HERE Optional Subtitle MELLODDY Multi-task Learning Chemical Assays ERASMUS GENNET Genome Wide Association Study EDRN Early Detection of Pancreatic Cancer U MINNESOTA, FAIRVIEW X-RAY Covid-19 Classification EXAM COVID-19 Oxygen Requirement Prediction FEDERATED LEARNING MOMENTUM
  • 5. NVIDIA FLARE § Apache License 2.0 to catalyze FL research & development § Enables Distributed, Multi-Party Collaborative Learning § Adapt existing ML/DL workflows to a Federated paradigm § Privacy Preserving Algorithms § Homomorphic Encryption & Differential Privacy § Secure Provisioning, Orchestration & Monitoring § Programmable APIs for Extensibility § Available on GitHub: https://github.com/nvidia/nvFlare Open-Source SDK for Federated Learning GPU CPU MULTI-GPU NVIDIA FLARE Federated Specification Training Flows Evaluation Flows Learning Algorithms Privacy Preserving Algorithms Management Tools Learner Confiiguration Authenticate Train Evaluate Model Updates NVIDIA FLARE Runtime Provisioning Orchestration Monitoring
  • 6. NVFLARE Key Design Principles § Research friendly • Ease of experiment (including multi-site) • Flexible for innovation and extension • Support popular ML/DL frameworks • Application domain agnostic § Applicable to real world scenarios • Security and privacy • System failures and unresponsive sites • Imperfect datasets
  • 7. NVIDIA FLARE Key Features https://developer.nvidia.com/flare
  • 8. LEARNING ALGORITHMS Adaptive Federated Optimization (FedOpt) Federated Averaging (FedAvg) Federated Proxy (FedProx) https://developer.nvidia.com/blog/creating-robust-and-generalizable-ai-models-with-nvidia-flare/ Cyclic Weight Transfer McMahan et al. Li et al. Reddi et al. Chang et al. • Weighted average to update global model • Clients add a loss to stay close the global model. • Avoids models drifting away from global model in heterogenous datasets. • Global model is updated using an optimizer (SGD w. momentum, Adam, Yogi, Adagrad, etc.) • Models are continuously fine-tuned and circulated around institutions More to come... Algorithms can be extended • Differential privacy • Homomorphic Encryption SCAFFOLD Ditto Karimireddy et al. Li et al. • Adds correction terms during training to deal with non-IID • Fairness through personalization
  • 9. NVIDIA FLARE ADDRESSES FEDERATED LEARNING PHASES DATA PREPARATION PROVISION & AUTHENTICATE FEDERATED PROGRAM RECIPE EDGE COLLABORATOR CONFIGURATIONS MONITOR & MANAGEMENT SOLUTION Provisioning startup kits SSL authentication authorization policies to control access Federated workflows Scatter-Gather, Cyclic, Eval, etc. Support any framework (TensorFlow, PyTorch, RAPIDS, Nemo etc.) Privacy preserving Federated data preparation/curation workflows Auxiliary APIs Monitoring, visualization
  • 10. NVIDIA FLARE v2.0 High-level Architecture Server Client gRPC gRPC g R P C Provision Tool TCP Admin Provision P r o v i s i o n Provision Client Client … …
  • 11. NVIDIA FLARE v2.0 High-level Architecture Server Provision Tool Admin Provision P r o v i s i o n Provision Client Client Client … …
  • 12. NVIDIA FLARE v2.0 High-level Architecture Server Client gRPC gRPC g R P C TCP Admin Client Client … …
  • 13. NVIDIA FLARE v2.0 High-level Architecture Server Client gRPC gRPC g R P C Provision Tool TCP Admin Provision P r o v i s i o n Provision Client Client API API API API … …
  • 14. NVIDIA FLARE APIs Componentized Architecture § Open Provision API § Defines overall project configuration and generates mutually-trusted configuration packages for server and clients using Provisioner and Builder modules. § Server Controller API § The Controller is a python object that defines the global Federated Learning control flow via Tasks and Events. § Client Worker API § The Worker API is used to define Executors that perform Tasks orchestrated by the Server Controller API § Admin API § The Admin API provides a means to control the Federated Learning System and allows application developers to manage operation via external interfaces (e.g., Web UI). … …
  • 15. Controller and Worker API Federated Workflows FL Client Worker FL Server Controller Assign Task Submit Task Result Filter Task Data Filter Task Data Filter Task Result Filter Task Result Execute Task The Controller and Worker APIs define the overall control flow via Events, Tasks, and Executors. § Inspired by HPC (OpenMPI) § The Controller defines the series of Tasks to be executed by Workers and determines how these Tasks are distributed (broadcast, cyclic, send). § The Worker implements Executors that execute specific named Tasks as defined and distributed by the Controller. § The Controller aggregates the Workers’ Task Result as defined in the Controller workflow. Filters can be used in both the Controller and Executor Task Data and Task Results.
  • 16. SCATTER-GATHER CONTROLLER FOR MODEL TRAINING Typical workflow for FedAvg, FedOpt, FedProx, etc. Scatter-Gather Global Model w 1. Server initializes model 2. For number of rounds: 1. Server broadcasts global model to workers 2. Workers validate global model and train on their data 3. Workers keep track on their locally best model (Personalization) 4. Workers send back updated model or updates 5. Server Gathers (Aggregates) updates and updates the global model Source: https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/scatter_and_gather.py
  • 17. CONTROLLERS FOR MODEL EVALUATION Global model evaluation, Cross-site model evaluation Global Model w FedEval (Global Model Validation/Cross-Site Validation) 1. Server sends models (e.g. global model and registered best local models) to each worker for evaluation 2. Server gathers the resulting metrics Metrics Metrics Metrics Best local models Site-1​ Site-2​ ...​ Site-N​ Global​ (Final) … … … … Global (Best)​ … … … … Site-1​ … … … … Site-2​ … … … … ...​ … … … … Site-N​ … … … … Models Evaluation sites Metrics Source: https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/global_model_eval.py https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/cross_site_model_eval.py
  • 18. PYTHON ADMIN CLIENT Interactive control of federated experiments Docs: https://nvidia.github.io/NVFlare/user_guide/admin_commands.html Server: Client-1: Client-2: Admin client console:
  • 19. PYTHON ADMIN API Automate Running FL experiments Initialization Initialize the API with actual values for the FL setup: host, port, paths to files and directories. A provisioned admin package should have ca_cert, client_cert, and client_key in the startup folder, and transfer can be created at the same level as startup. Log in with the admin name that corresponds to the provisioned package. After using FLAdminAPI, the logout() function can be called to log out. Both login() and logout() are inherited from AdminAPI. Usage Simplest sequence to upload, deploy, and start training with the “hello-pt” example app: Contents of the returned FLAdminAPIResponse can be accessed: Example: https://github.com/NVIDIA/NVFlare/blob/main/examples/cifar10/run_fl.py
  • 20. INTRO EXAMPLES Github repo has multiple examples § Hello-numpy § Hello-numpy-cross-val § Hello-tf2 § Hello-pt § Hello-monai § CIFAR-10 § Prostate § BraTS
  • 21. END-TO-END EXAMPLES (CIFAR10, BRATS18, PROSTATE) § Comprehensive example for researchers to compare algorithms 1. Set up a virtual environment 2. Create your FL workspace 3. Run automated experiments 1. Varying data heterogeneity of data splits 2. Centralized training 3. FedAvg on different data splits 4. Advanced FL algorithms (FedProx and FedOpt) 5. Secure aggregation using homomorphic encryption 6. Differential privacy 4. Results
  • 22. CROSS-SITE VALIDATION AND GLOBAL MODEL EVALUATION Performance of locally best models (selected by best validation score on local data) using (a) local training data alone and (b) after federated learning. Source: Federated Learning for Breast Density Classification: A Real-World Implementation
  • 23. FEDERATED ANALYSIS Gather summary statistics Example § Compute the local intensity histograms of each client’s data § k-anonymity (e.g., at least 10 images) § Can be enhanced with differential privacy § Compute a global histogram § Result is accessible to admin on the server
  • 24. SECURITY & PRIVACY Homomorphic Encryption & Differential Privacy Differential Privacy for BraTS18 Segmentation Example: https://github.com/NVIDIA/NVFlare/tree/main/examples/brats18 validation Dice scores of the global model for 600 training epochs: Federated Learning with Homomorphic Encryption Blog: https://developer.nvidia.com/blog/federated-learning-with- homomorphic-encryption/ Example: https://github.com/NVIDIA/NVFlare/tree/main/examples/cifar10
  • 25. NVIDIA FLARE at GTC Please join us Monday 3/21 for FLARE Dev Day at GTC [SE1991]