SlideShare a Scribd company logo
11
Moving Your Machine
Learning Models to
Production with
TensorFlow Extended
Jonathan Mugan
22
Moving From Our Hut to the Production Floor
Your model is going to live for a long time. Not just for a demo.
You must know when to update it. The world changes.
You must ensure production data matches training data. Data reflects its origins.
You may need to track multiple model versions. E.g., for different states.
You need to batch the input to serving. One-at-a-time is slow.
33
Interchangeable Parts
and the ML Revolution
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow ServingImage by https://www.flickr.com/photos/36224933@N07
https://creativecommons.org/licenses/by-sa/2.0/deed.en
44
Interchangeable Parts
and the ML Revolution
• TensorFlow Extended (TFX)
• TFX used internally by Google and
recently open sourced
• Represents your pipeline to
production as a sequence of
components
• Building any one model is more
work, but for large endeavors, TFX
helps to keep you organized
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
55
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
66
TensorFlow ExtendedData Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
ML Metadata (MLMD)
• Individual components talk to the
metadata store (MLMD).
• MLMD doesn’t store data itself. It
stores data about data.
https://www.tensorflow.org/tfx/guide
https://www.tensorflow.org/tfx/guide/mlmd
Each component has three pieces
1. Driver (gets information from MLMD)
2. Executor (does what the component does)
3. Publisher (writes results to MLMD)
Organized by library (components in parenthesis)
77
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
• Pulls in your data and put it into binary format
• Also splits it into train and test
• Protocol Buffers
• tf.Example into a TFRecord file
https://www.tensorflow.org/tutorials/load_data/tfrecord
https://www.tensorflow.org/tfx/guide/examplegen
Data Ingestion:
ExampleGen
88
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
Looks at your data and generates a schema, which
you manually update.
It makes sure the data you pass in later during serving
is still in the same format and hasn’t drifted.
Also has a great way to visualize data, FACETS, we will
see later.
https://www.tensorflow.org/tfx/guide/tfdv
TensorFlow Data Validation:
StatisticsGen, SchemaGen,
Example Validator
99
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
Example Schema
1010
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
Converts your data
• E.g., One-hot encoding, categorical with a vocab
• Part of TensorFlow graph, for better or worse
• Good for transformations that require looking at
all values
TensorFlow Transform:
Transform
Example: tft.scale_to_z_score
subtracts mean and divides by standard deviation
Features come in many types, and TensorFlow
Transform converts them into a format that can be
ingested by a machine learning model.
Nice to have this explicit.
https://www.tensorflow.org/tfx/guide/transform
1111
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
• Trains the model: Part we are all
familiar with
• Except uses an Estimator
• Can use KERAS
tf.keras.estimator.model_to_estimator()
https://www.tensorflow.org/tfx/guide/trainer
Estimator or Keras Model:
Trainer
1212
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
Evaluator Component
• Evaluates the model.
• Uses TensorFlow Model Analysis (TFMA), which we
will see shortly.
• https://www.tensorflow.org/tfx/guide/evaluator
TensorFlow Model Analysis:
Evaluator, Model Validator
Model Validator Component
• You set a baseline (such as the current
serving model) and a metric (such as AUC)
• Marks in the metadata if the model passes
the baseline.
• https://www.tensorflow.org/tfx/guide/modelval
1313
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
For a deeper understanding, see Ice-T’s 1988
hit song, “I’m Your Pusher”
https://www.tensorflow.org/tfx/guide/pusher
Validation Outcomes:
Pusher
• Pushes the model to Serving if it is validated
• I.e., if your new model is better than the
existing model, push it to the model server.
1414
Data Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
• Uses the model to perform inference
• Called via gRPC APIs or RESTFUL APIs
• Easy to get running with Docker
• You can call a particular version of a model
• Takes care of batching
https://www.tensorflow.org/tfx/guide/serving
TensorFlow Serving
1515
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
1616
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• ML Metadata
• Apache Airflow
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
1717
Metadata StoreData Ingestion
(ExampleGen)
TensorFlow Data Validation
(StatisticsGen, SchemaGen, Example Validator)
TensorFlow Transform
(Transform)
Estimator or Keras Model
(Trainer)
TensorFlow Model Analysis
(Evaluator, Model Validator)
Validation Outcomes
(Pusher)
TensorFlow Serving
(Model Server)
ML Metadata (MLMD)
https://www.tensorflow.org/tfx/guide/mlmd
1818
Looking at the table Artifact using the DB Browser for SQLite
1919
The Types of
Artifacts
The type_id
field from the
previous slide
maps here
2020
You can see the properties of the artifacts in the ArtifactProperty table
2121
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• ML Metadata
• Apache Airflow
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
2222
Pipeline Management with Apache Airflow
Allows you to trigger and keep track of pipelines.
2323
Pipeline Management with Apache Airflow
2424
Pipeline Management with Apache Airflow
• You can also use Kubeflow https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow_gcp.py
• And Apache Beam https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_beam.py
2525
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
2626
TensorFlow 2.0
• Don’t have to define the graph separately
• More like PyTorch
• There are two ways you can do computation:
• Eager: like PyTorch, just compute
• tf.function: You decorate a function and call it
2727
TensorFlow 1.x
session
TensorFlow 2.x
function
TensorFlow 2.x
eager
output
output output
Still get performance of Session
https://www.tensorflow.org/guide/function https://www.tensorflow.org/guide/eager
Debug like a civilized person
2828
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
2929
Data
• tf.train.Example is tf.train.Feature protobuf
message, where each value has a name
and a type (tf.train.BytesList,
tf.train.FloatList, tf.train.Int64List)
• TFRecord is a format for storing sequences
of binary records, each record is
tf.train.Example
• tf.data.Dataset can take in TFRecord and
create an iterator for batching
• tf.parse_example unpacks tf.Example into
standard tensors.
https://www.tensorflow.org/tutorials/load_data/tf_records
Features
• tf.feature_column,
where you further
specify what it is, such as
one-hot, vocabulary, and
embeddings and such.
https://www.tensorflow.org/guide/feature_columns
tf.train.Example specifies what it is
for storage, and tf.feature_column is
for the input to a model.
3030
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
3131
To build a model you need
• format of model input
• tf.feature_column
• model architecture and hyperparameters
• tf.estimator
• (or KERAS with tf.keras.estimator.model_to_estimator)
• function to deliver training data
• tf.estimator.TrainSpec from tf.data
• function to deliver eval data
• tf.estimator.EvalSpec from tf.data
• function to deliver serving data
• tf.estimator.FinalExporter
3232
TensorFlow Estimator
• Estimator is a wrapper for regular
TensorFlow that automatically
scales to multiple machines and
automatically outputs results to
TensorBoard
Shout out to model explainability using estimator using boosted trees
https://www.tensorflow.org/tutorials/estimator/boosted_trees_model_understanding
https://www.tensorflow.org/tutorials/estimator/boosted_trees
3333
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
3434
TensorBoard
Plotting of prescribers
Red has more overdose
Green has fewer
3D plots not that useful,
but they look cool
You can even use TensorBoard
from PyTorch
https://pytorch.org/docs/stable/tensorboard.html
3535
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
3636
TensorFlow Data Validation (TFDV)
• We need to understand our data as well as possible.
• TFDV provides tools that make that less difficult.
• Helps to identify bugs in the data by showing you
pictures that don’t look right.
• https://www.tensorflow.org/tfx/data_validation/get_started
3737
3838
3939
By sorting by non-uniformity,
we can debug features.
4040
In general, we can make sure the distributions are what we would expect.
4141
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
4242
TensorFlow
Model
Analysis
(TFMA)
We can see how well
our model does
by each slice.
We see that this model
does much better for
females than males.
https://www.tensorflow.org/tfx/guide/tfma
4343
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• TensorFlow 2.0
• TensorFlow Data and Features
• TensorFlow Estimators
• TensorBoard
• TensorFlow Data Visualization (TFDV) [Facets]
• TensorFlow Model Analysis (TFMA)
• What-If Tool
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
4444
What-IF Tool
• The What-If Tool applies a model from TensorFlow
Serving to any data you give it.
• https://pair-code.github.io/what-if-tool/index.html
• Change a record and see what the model does
• Find the most similar record with a different classification
• Can be used for fairness. Adjust the model so it is equally
likely to predict “yes” for each group
• https://www.coursera.org/lecture/machine-learning-business-professionals/activity-
applying-fairness-concerns-with-the-what-if-tool-review-0mYda
4545
Looking at the data
by race, age, and
inference score
4646
What-If Tool showing the
the probability of overdose
for individual features.
4747
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
4848
Alternatives (kind of)
• MLflow https://mlflow.org/docs/latest/index.html
• Netflix Metaflow https://github.com/Netflix/metaflow
• Sacred https://github.com/IDSIA/sacred
• Dataiku DSS https://www.dataiku.com/product/
• Polyaxon https://polyaxon.com/
• Facebook Ax https://www.ax.dev/
They all do something a little different,
with pieces straddling different sides of the data science/production divide
4949
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Streamlit Dashboard
• Python Typing, Dataclasses, and Enum
• Conclusion
5050
Streamlit
Dashboard
• Writes to the browser
• Works well for artifacts in the ML pipeline.
https://github.com/streamlit/streamlit/
https://streamlit.io/docs/getting_started.html
5151
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Streamlit Dashboard
• Python Typing, Dataclasses, and Enum
• Conclusion
5252
Typing, Dataclasses,
and Enum
You can build interchangeable
parts right in Python.
Not new of course, but they make
Python a little less wild west.
Output:
[Turtle(size=6, name='Anita'),
Turtle(size=2, name='Anita')]
5353
Outline
• Introduction to TensorFlow Extended (TFX)
• TensorFlow Extended Pipeline Components
• Running the Pipeline
• TensorFlow and TensorFlow Tools
• Alternatives to TensorFlow Extended
• Other Useful Tools
• Conclusion
5454
TFX Disadvantages
• Steep learning curve
• Changes constantly (but not while you are watching it)
• Somewhat inflexible, you can create your own
components, but steep learning curve
• No hyperparameter search (yet,
https://github.com/tensorflow/tfx/issues/182)
5555
TFX Advantages
• Set up to scale
• Documents your process through artifacts
• Warm-starting: as new data comes in, you don’t have to
start training over. Keeps models fresh
• Tools to see data and debug problems
• Don’t have to rerun what is already run
5656
Where to Start
• Jupyter notebook tutorial
https://www.tensorflow.org/tfx/tutorials/tfx/components
• Airflow tutorial
https://www.tensorflow.org/tfx/tutorials/tfx/airflow_workshop
5757
Happy Hour!
6500 River Place Blvd.
Bldg. 3, Suite 120
Austin, TX. 78730
Jonathan Mugan, Ph. D.
Email: jmugan@deumbra.com
5858
Appendix
• Original TFX paper
https://ai.google/research/pubs/pub46484
• Documentation
• https://www.tensorflow.org/tfx
• https://www.tensorflow.org/tfx/tutorials
• https://www.tensorflow.org/tfx/guide
• https://www.tensorflow.org/tfx/api_docs

More Related Content

What's hot (20)

PDF
Intro to LLMs
Loic Merckel
 
PPTX
An Introduction to Information Retrieval and Applications
sathish sak
 
PPTX
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
PPTX
Artificial neural network
GauravPandey319
 
PDF
Using an employee knowledge graph for employee engagement and career mobility
Neo4j
 
PDF
Notes from Coursera Deep Learning courses by Andrew Ng
dataHacker. rs
 
PDF
AI, Machine Learning, and Data Science Concepts
Dan O'Leary
 
PDF
Introduction to Data Science
ANOOP V S
 
PPTX
A Tutorial to AI Ethics - Fairness, Bias & Perception
Dr. Kim (Kyllesbech Larsen)
 
PDF
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
PDF
Generative AI
All Things Open
 
PDF
Responsible Generative AI
CMassociates
 
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
PDF
Big data-analytics-cpe8035
Neelam Rawat
 
PDF
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
Dobo Radichkov
 
PPTX
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Weiwei Guo
 
PPTX
Fraud Detection Architecture
Gwen (Chen) Shapira
 
PPTX
Internet of things (IoT)
Prem Singhaniya
 
PDF
Introduction to Data Mining
Si Krishan
 
PPTX
Big Data Analytics
Ghulam Imaduddin
 
Intro to LLMs
Loic Merckel
 
An Introduction to Information Retrieval and Applications
sathish sak
 
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
Artificial neural network
GauravPandey319
 
Using an employee knowledge graph for employee engagement and career mobility
Neo4j
 
Notes from Coursera Deep Learning courses by Andrew Ng
dataHacker. rs
 
AI, Machine Learning, and Data Science Concepts
Dan O'Leary
 
Introduction to Data Science
ANOOP V S
 
A Tutorial to AI Ethics - Fairness, Bias & Perception
Dr. Kim (Kyllesbech Larsen)
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Generative AI
All Things Open
 
Responsible Generative AI
CMassociates
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Krishnaram Kenthapadi
 
Big data-analytics-cpe8035
Neelam Rawat
 
OLX Group presentation for AWS Redshift meetup in London, 5 July 2017
Dobo Radichkov
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Weiwei Guo
 
Fraud Detection Architecture
Gwen (Chen) Shapira
 
Internet of things (IoT)
Prem Singhaniya
 
Introduction to Data Mining
Si Krishan
 
Big Data Analytics
Ghulam Imaduddin
 

Similar to Moving Your Machine Learning Models to Production with TensorFlow Extended (20)

PPTX
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
 
PDF
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
PDF
TensorFlow Extension (TFX) and Apache Beam
markgrover
 
PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
PDF
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward
 
PDF
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
Databricks
 
PPTX
Tensorflow Ecosystem
Vivek Raja P S
 
PDF
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
Fei Chen
 
PDF
running Tensorflow in Production
Matthias Feys
 
PDF
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
PDF
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
PDF
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
PDF
Streaming Inference with Apache Beam and TFX
Databricks
 
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
PDF
TF Dev Summit 2019
Ray Hilton
 
PDF
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models ...
tmnfxlrqd1983
 
PDF
TFX: A tensor flow-based production-scale machine learning platform
Shunya Ueta
 
PDF
The Flow of TensorFlow
Jeongkyu Shin
 
PDF
Machine learning operations model book mlops
RuyPerez1
 
PDF
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
TensorFlow Extension (TFX) and Apache Beam
markgrover
 
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Chris Fregly
 
Flink Forward San Francisco 2019: TensorFlow Extended: An end-to-end machine ...
Flink Forward
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
Databricks
 
Tensorflow Ecosystem
Vivek Raja P S
 
ML Platform Q1 Meetup: End to-end Feature Analysis, Validation and Transforma...
Fei Chen
 
running Tensorflow in Production
Matthias Feys
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Streaming Inference with Apache Beam and TFX
Databricks
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
TF Dev Summit 2019
Ray Hilton
 
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models ...
tmnfxlrqd1983
 
TFX: A tensor flow-based production-scale machine learning platform
Shunya Ueta
 
The Flow of TensorFlow
Jeongkyu Shin
 
Machine learning operations model book mlops
RuyPerez1
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
Ad

More from Jonathan Mugan (9)

PPTX
How to build someone we can talk to
Jonathan Mugan
 
PDF
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
PPTX
Data Day Seattle, From NLP to AI
Jonathan Mugan
 
PPTX
Data Day Seattle, Chatbots from First Principles
Jonathan Mugan
 
PPTX
Chatbots from first principles
Jonathan Mugan
 
PPTX
From Natural Language Processing to Artificial Intelligence
Jonathan Mugan
 
PPTX
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
PPTX
Deep Learning for Natural Language Processing
Jonathan Mugan
 
PPTX
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
How to build someone we can talk to
Jonathan Mugan
 
Generating Natural-Language Text with Neural Networks
Jonathan Mugan
 
Data Day Seattle, From NLP to AI
Jonathan Mugan
 
Data Day Seattle, Chatbots from First Principles
Jonathan Mugan
 
Chatbots from first principles
Jonathan Mugan
 
From Natural Language Processing to Artificial Intelligence
Jonathan Mugan
 
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
Deep Learning for Natural Language Processing
Jonathan Mugan
 
What Deep Learning Means for Artificial Intelligence
Jonathan Mugan
 
Ad

Recently uploaded (20)

PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 

Moving Your Machine Learning Models to Production with TensorFlow Extended

  • 1. 11 Moving Your Machine Learning Models to Production with TensorFlow Extended Jonathan Mugan
  • 2. 22 Moving From Our Hut to the Production Floor Your model is going to live for a long time. Not just for a demo. You must know when to update it. The world changes. You must ensure production data matches training data. Data reflects its origins. You may need to track multiple model versions. E.g., for different states. You need to batch the input to serving. One-at-a-time is slow.
  • 3. 33 Interchangeable Parts and the ML Revolution Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow ServingImage by https://www.flickr.com/photos/36224933@N07 https://creativecommons.org/licenses/by-sa/2.0/deed.en
  • 4. 44 Interchangeable Parts and the ML Revolution • TensorFlow Extended (TFX) • TFX used internally by Google and recently open sourced • Represents your pipeline to production as a sequence of components • Building any one model is more work, but for large endeavors, TFX helps to keep you organized Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving
  • 5. 55 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 6. 66 TensorFlow ExtendedData Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving ML Metadata (MLMD) • Individual components talk to the metadata store (MLMD). • MLMD doesn’t store data itself. It stores data about data. https://www.tensorflow.org/tfx/guide https://www.tensorflow.org/tfx/guide/mlmd Each component has three pieces 1. Driver (gets information from MLMD) 2. Executor (does what the component does) 3. Publisher (writes results to MLMD) Organized by library (components in parenthesis)
  • 7. 77 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving • Pulls in your data and put it into binary format • Also splits it into train and test • Protocol Buffers • tf.Example into a TFRecord file https://www.tensorflow.org/tutorials/load_data/tfrecord https://www.tensorflow.org/tfx/guide/examplegen Data Ingestion: ExampleGen
  • 8. 88 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving Looks at your data and generates a schema, which you manually update. It makes sure the data you pass in later during serving is still in the same format and hasn’t drifted. Also has a great way to visualize data, FACETS, we will see later. https://www.tensorflow.org/tfx/guide/tfdv TensorFlow Data Validation: StatisticsGen, SchemaGen, Example Validator
  • 9. 99 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving Example Schema
  • 10. 1010 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving Converts your data • E.g., One-hot encoding, categorical with a vocab • Part of TensorFlow graph, for better or worse • Good for transformations that require looking at all values TensorFlow Transform: Transform Example: tft.scale_to_z_score subtracts mean and divides by standard deviation Features come in many types, and TensorFlow Transform converts them into a format that can be ingested by a machine learning model. Nice to have this explicit. https://www.tensorflow.org/tfx/guide/transform
  • 11. 1111 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving • Trains the model: Part we are all familiar with • Except uses an Estimator • Can use KERAS tf.keras.estimator.model_to_estimator() https://www.tensorflow.org/tfx/guide/trainer Estimator or Keras Model: Trainer
  • 12. 1212 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving Evaluator Component • Evaluates the model. • Uses TensorFlow Model Analysis (TFMA), which we will see shortly. • https://www.tensorflow.org/tfx/guide/evaluator TensorFlow Model Analysis: Evaluator, Model Validator Model Validator Component • You set a baseline (such as the current serving model) and a metric (such as AUC) • Marks in the metadata if the model passes the baseline. • https://www.tensorflow.org/tfx/guide/modelval
  • 13. 1313 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving For a deeper understanding, see Ice-T’s 1988 hit song, “I’m Your Pusher” https://www.tensorflow.org/tfx/guide/pusher Validation Outcomes: Pusher • Pushes the model to Serving if it is validated • I.e., if your new model is better than the existing model, push it to the model server.
  • 14. 1414 Data Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving • Uses the model to perform inference • Called via gRPC APIs or RESTFUL APIs • Easy to get running with Docker • You can call a particular version of a model • Takes care of batching https://www.tensorflow.org/tfx/guide/serving TensorFlow Serving
  • 15. 1515 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 16. 1616 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • ML Metadata • Apache Airflow • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 17. 1717 Metadata StoreData Ingestion (ExampleGen) TensorFlow Data Validation (StatisticsGen, SchemaGen, Example Validator) TensorFlow Transform (Transform) Estimator or Keras Model (Trainer) TensorFlow Model Analysis (Evaluator, Model Validator) Validation Outcomes (Pusher) TensorFlow Serving (Model Server) ML Metadata (MLMD) https://www.tensorflow.org/tfx/guide/mlmd
  • 18. 1818 Looking at the table Artifact using the DB Browser for SQLite
  • 19. 1919 The Types of Artifacts The type_id field from the previous slide maps here
  • 20. 2020 You can see the properties of the artifacts in the ArtifactProperty table
  • 21. 2121 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • ML Metadata • Apache Airflow • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 22. 2222 Pipeline Management with Apache Airflow Allows you to trigger and keep track of pipelines.
  • 24. 2424 Pipeline Management with Apache Airflow • You can also use Kubeflow https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow_gcp.py • And Apache Beam https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_beam.py
  • 25. 2525 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 26. 2626 TensorFlow 2.0 • Don’t have to define the graph separately • More like PyTorch • There are two ways you can do computation: • Eager: like PyTorch, just compute • tf.function: You decorate a function and call it
  • 27. 2727 TensorFlow 1.x session TensorFlow 2.x function TensorFlow 2.x eager output output output Still get performance of Session https://www.tensorflow.org/guide/function https://www.tensorflow.org/guide/eager Debug like a civilized person
  • 28. 2828 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 29. 2929 Data • tf.train.Example is tf.train.Feature protobuf message, where each value has a name and a type (tf.train.BytesList, tf.train.FloatList, tf.train.Int64List) • TFRecord is a format for storing sequences of binary records, each record is tf.train.Example • tf.data.Dataset can take in TFRecord and create an iterator for batching • tf.parse_example unpacks tf.Example into standard tensors. https://www.tensorflow.org/tutorials/load_data/tf_records Features • tf.feature_column, where you further specify what it is, such as one-hot, vocabulary, and embeddings and such. https://www.tensorflow.org/guide/feature_columns tf.train.Example specifies what it is for storage, and tf.feature_column is for the input to a model.
  • 30. 3030 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 31. 3131 To build a model you need • format of model input • tf.feature_column • model architecture and hyperparameters • tf.estimator • (or KERAS with tf.keras.estimator.model_to_estimator) • function to deliver training data • tf.estimator.TrainSpec from tf.data • function to deliver eval data • tf.estimator.EvalSpec from tf.data • function to deliver serving data • tf.estimator.FinalExporter
  • 32. 3232 TensorFlow Estimator • Estimator is a wrapper for regular TensorFlow that automatically scales to multiple machines and automatically outputs results to TensorBoard Shout out to model explainability using estimator using boosted trees https://www.tensorflow.org/tutorials/estimator/boosted_trees_model_understanding https://www.tensorflow.org/tutorials/estimator/boosted_trees
  • 33. 3333 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 34. 3434 TensorBoard Plotting of prescribers Red has more overdose Green has fewer 3D plots not that useful, but they look cool You can even use TensorBoard from PyTorch https://pytorch.org/docs/stable/tensorboard.html
  • 35. 3535 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 36. 3636 TensorFlow Data Validation (TFDV) • We need to understand our data as well as possible. • TFDV provides tools that make that less difficult. • Helps to identify bugs in the data by showing you pictures that don’t look right. • https://www.tensorflow.org/tfx/data_validation/get_started
  • 37. 3737
  • 38. 3838
  • 39. 3939 By sorting by non-uniformity, we can debug features.
  • 40. 4040 In general, we can make sure the distributions are what we would expect.
  • 41. 4141 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 42. 4242 TensorFlow Model Analysis (TFMA) We can see how well our model does by each slice. We see that this model does much better for females than males. https://www.tensorflow.org/tfx/guide/tfma
  • 43. 4343 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • TensorFlow 2.0 • TensorFlow Data and Features • TensorFlow Estimators • TensorBoard • TensorFlow Data Visualization (TFDV) [Facets] • TensorFlow Model Analysis (TFMA) • What-If Tool • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 44. 4444 What-IF Tool • The What-If Tool applies a model from TensorFlow Serving to any data you give it. • https://pair-code.github.io/what-if-tool/index.html • Change a record and see what the model does • Find the most similar record with a different classification • Can be used for fairness. Adjust the model so it is equally likely to predict “yes” for each group • https://www.coursera.org/lecture/machine-learning-business-professionals/activity- applying-fairness-concerns-with-the-what-if-tool-review-0mYda
  • 45. 4545 Looking at the data by race, age, and inference score
  • 46. 4646 What-If Tool showing the the probability of overdose for individual features.
  • 47. 4747 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 48. 4848 Alternatives (kind of) • MLflow https://mlflow.org/docs/latest/index.html • Netflix Metaflow https://github.com/Netflix/metaflow • Sacred https://github.com/IDSIA/sacred • Dataiku DSS https://www.dataiku.com/product/ • Polyaxon https://polyaxon.com/ • Facebook Ax https://www.ax.dev/ They all do something a little different, with pieces straddling different sides of the data science/production divide
  • 49. 4949 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Streamlit Dashboard • Python Typing, Dataclasses, and Enum • Conclusion
  • 50. 5050 Streamlit Dashboard • Writes to the browser • Works well for artifacts in the ML pipeline. https://github.com/streamlit/streamlit/ https://streamlit.io/docs/getting_started.html
  • 51. 5151 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Streamlit Dashboard • Python Typing, Dataclasses, and Enum • Conclusion
  • 52. 5252 Typing, Dataclasses, and Enum You can build interchangeable parts right in Python. Not new of course, but they make Python a little less wild west. Output: [Turtle(size=6, name='Anita'), Turtle(size=2, name='Anita')]
  • 53. 5353 Outline • Introduction to TensorFlow Extended (TFX) • TensorFlow Extended Pipeline Components • Running the Pipeline • TensorFlow and TensorFlow Tools • Alternatives to TensorFlow Extended • Other Useful Tools • Conclusion
  • 54. 5454 TFX Disadvantages • Steep learning curve • Changes constantly (but not while you are watching it) • Somewhat inflexible, you can create your own components, but steep learning curve • No hyperparameter search (yet, https://github.com/tensorflow/tfx/issues/182)
  • 55. 5555 TFX Advantages • Set up to scale • Documents your process through artifacts • Warm-starting: as new data comes in, you don’t have to start training over. Keeps models fresh • Tools to see data and debug problems • Don’t have to rerun what is already run
  • 56. 5656 Where to Start • Jupyter notebook tutorial https://www.tensorflow.org/tfx/tutorials/tfx/components • Airflow tutorial https://www.tensorflow.org/tfx/tutorials/tfx/airflow_workshop
  • 57. 5757 Happy Hour! 6500 River Place Blvd. Bldg. 3, Suite 120 Austin, TX. 78730 Jonathan Mugan, Ph. D. Email: [email protected]
  • 58. 5858 Appendix • Original TFX paper https://ai.google/research/pubs/pub46484 • Documentation • https://www.tensorflow.org/tfx • https://www.tensorflow.org/tfx/tutorials • https://www.tensorflow.org/tfx/guide • https://www.tensorflow.org/tfx/api_docs