SlideShare a Scribd company logo
Tensorflow 2 Pocket Reference Building And
Deploying Machine Learning Models 1st Edition Kc
Tung download
https://ebookbell.com/product/tensorflow-2-pocket-reference-
building-and-deploying-machine-learning-models-1st-edition-kc-
tung-34833530
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Tensorflow 2 Pocket Primer Oswald Campesato
https://ebookbell.com/product/tensorflow-2-pocket-primer-oswald-
campesato-49431688
Tensorflow 20 Pocket Primer Oswald Campesato
https://ebookbell.com/product/tensorflow-20-pocket-primer-oswald-
campesato-27559918
Tensorflow Pocket 1 Primer Oswald Campesato
https://ebookbell.com/product/tensorflow-pocket-1-primer-oswald-
campesato-47523820
Tensorflow Pocket Primer Oswald Campesato
https://ebookbell.com/product/tensorflow-pocket-primer-oswald-
campesato-49850274
Python For Tensorflow Pocket Primer Oswald Campesato
https://ebookbell.com/product/python-for-tensorflow-pocket-primer-
oswald-campesato-47523828
Python For Tensorflow Pocket Primer Oswald Campesato Campesato
https://ebookbell.com/product/python-for-tensorflow-pocket-primer-
oswald-campesato-campesato-11483668
Python For Tensorflow Pocket Primer Oswald Campesato
https://ebookbell.com/product/python-for-tensorflow-pocket-primer-
oswald-campesato-11483670
Tensorflow 20 Computer Vision Cookbook Implement Machine Learning
Solutions To Overcome Various Computer Vision Challenges Jess Martinez
https://ebookbell.com/product/tensorflow-20-computer-vision-cookbook-
implement-machine-learning-solutions-to-overcome-various-computer-
vision-challenges-jess-martinez-23459912
Tensorflow 2 Reinforcement Learning Cookbook Over 50 Recipes To Help
You Build Train And Deploy Learning Agents For Realworld Applications
Praveen Palanisamy
https://ebookbell.com/product/tensorflow-2-reinforcement-learning-
cookbook-over-50-recipes-to-help-you-build-train-and-deploy-learning-
agents-for-realworld-applications-praveen-palanisamy-35470778
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
KC Tung
TensorFlow 2
Pocket Reference
Building and Deploying
Machine Learning Models
978-1-492-08918-6
[LSI]
TensorFlow 2 Pocket Reference
by KC Tung
Copyright © 2021 Favola Vera, LLC. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebasto‐
pol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promo‐
tional use. Online editions are also available for most titles (http://oreilly.com).
For more information, contact our corporate/institutional sales department:
800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Rebecca Novack
Development Editor: Sarah Grey
Production Editor: Beth Kelly
Copyeditor: Penelope Perkins
Proofreader: Audrey Doyle
Indexer: Potomac Indexing, LLC
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Kate Dullea
August 2021: First Edition
Revision History for the First Edition
2021-07-19: First Release
See https://oreil.ly/tf2pr for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. TensorFlow
2 Pocket Reference, the cover image, and related trade dress are trademarks of
O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not represent
the publisher’s views. While the publisher and the author have used good faith
efforts to ensure that the information and instructions contained in this work
are accurate, the publisher and the author disclaim all responsibility for errors
or omissions, including without limitation responsibility for damages result‐
ing from the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code samples or
other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to
ensure that your use thereof complies with such licenses and/or rights.
To my beloved wife Katy, who always supports me and sees the
best in me. To my father Jerry, who raised me to pursue learning
with a sense of purpose. To my hard-working and passionate
readers, whose aspiration for continuous learning resonates
with me and inspired me to write this book.
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
Table of Contents
Preface ix
Chapter 1: Introduction to TensorFlow 2 1
Improvements in TensorFlow 2 2
Making Commonly Used Operations Easy 4
Wrapping Up 9
Chapter 2: Data Storage and Ingestion 11
Streaming Data with Python Generators 12
Streaming File Content with a Generator 14
JSON Data Structures 17
Setting Up a Pattern for Filenames 18
Splitting a Single CSV File into Multiple CSV Files 19
Creating a File Pattern Object Using tf.io 20
Creating a Streaming Dataset Object 21
Streaming a CSV Dataset 23
Organizing Image Data 24
Using TensorFlow Image Generator 26
v
Streaming Cross-Validation Images 29
Inspecting Resized Images 30
Wrapping Up 32
Chapter 3: Data Preprocessing 35
Preparing Tabular Data for Training 35
Preparing Image Data for Processing 47
Preparing Text Data for Processing 56
Wrapping Up 62
Chapter 4: Reusable Model Elements 65
The Basic TensorFlow Hub Workflow 67
Image Classification by Transfer Learning 70
Using the tf.keras.applications Module for Pretrained
Models 80
Wrapping Up 83
Chapter 5: Data Pipelines for Streaming Ingestion 85
Streaming Text Files with the
text_dataset_from_directory Function 86
Streaming Images with a File List Using the
flow_from_dataframe Method 90
Streaming a NumPy Array with the from_tensor_slices
Method 97
Wrapping Up 102
Chapter 6: Model Creation Styles 103
Using the Symbolic API 104
Understanding Inheritance 114
Using the Imperative API 117
Choosing the API 120
vi | Table of Contents
Using the Built-In Training Loop 122
Creating and Using a Custom Training Loop 123
Wrapping Up 126
Chapter 7: Monitoring the Training Process 129
Callback Objects 130
TensorBoard 140
Wrapping Up 151
Chapter 8: Distributed Training 153
Data Parallelism 154
Using the Class tf.distribute.MirroredStrategy 158
The Horovod API 169
Wrapping Up 182
Chapter 9: Serving TensorFlow Models 183
Model Serialization 184
TensorFlow Serving 193
Wrapping Up 200
Chapter 10: Improving the Modeling Experience: Fairness
Evaluation and Hyperparameter Tuning 203
Model Fairness 205
Hyperparameter Tuning 217
End-to-End Hyperparameter Tuning 219
Wrapping Up 227
Index 229
Table of Contents | vii
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
Preface
The TensorFlow ecosystem has evolved into many different
frameworks to serve a variety of roles and functions. That flexi‐
bility is part of the reason for its widespread adoption, but it
also complicates the learning curve for data scientists, machine
learning (ML) engineers, and other technical stakeholders.
There are so many ways to manage TensorFlow models for
common tasks—such as data and feature engineering, data
ingestions, model selection, training patterns, cross validation
against overfitting, and deployment strategies—that the choices
can be overwhelming.
This pocket reference will help you make choices about how to
do your work with TensorFlow, including how to set up com‐
mon data science and ML workflows using TensorFlow 2.0
design patterns in Python. Examples describe and demonstrate
TensorFlow coding patterns and other tasks you are likely to
encounter frequently in the course of your ML project work.
You can use it as both a how-to book and a reference.
This book is intended for current and potential ML engineers,
data scientists, and enterprise ML solution architects who want
to advance their knowledge and experience in reusable patterns
and best practices in TensorFlow modeling. Perhaps you’ve
already read an introductory TensorFlow book, and you stay up
to date with the field of data science generally. This book
ix
assumes that you have hands-on experience using Python (and
possibly NumPy, pandas, and JSON libraries) for data engi‐
neering, feature engineering routines, and building TensorFlow
models. Experience with common data structures such as lists,
dictionaries, and NumPy arrays will also be very helpful.
Unlike many other TensorFlow books, this book is structured
around the tasks you’ll likely need to do, such as:
• When and why should you feed training data as a NumPy
array or streaming dataset? (Chapters 2 and 5)
• How can you leverage a pretrained model using transfer
learning? (Chapters 3 and 4)
• Should you use a generic fit function to do your training
or write a custom training loop? (Chapter 6)
• How should you manage and make use of model check‐
points? (Chapter 7)
• How can you review the training process using Tensor‐
Board? (Chapter 7)
• If you can’t fit all of your data into your runtime’s memory,
how can you perform distributed training using multiple
accelerators, such as GPUs? (Chapter 8)
• How do you pass data to your model during inferencing
and how do you handle output? (Chapter 9)
• Is your model fair? (Chapter 10)
If you are wrestling with questions like these, this book will be
helpful to you.
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames,
and file extensions.
x | Preface
Constant width
Used for program listings, as well as within paragraphs to
refer to program elements such as variable or function
names, databases, data types, environment variables, state‐
ments, and keywords.
Constant width bold
Shows commands or other text that should be typed liter‐
ally by the user.
Constant width italic
Shows text that should be replaced with user-supplied val‐
ues or by values determined by context.
TIP
This element signifies a tip or suggestion.
Using Code Examples
Supplemental material (code examples, exercises, etc.) can be
downloaded at https://github.com/shinchan75034/tensorflow-
pocket-ref.
If you have a technical question or a problem using the code
examples, please send email to bookquestions@oreilly.com.
This book is here to help you get your job done. In general,
if example code is offered with this book, you may use it in
your programs and documentation. You do not need to contact
us for permission unless you’re reproducing a significant por‐
tion of the code. For example, writing a program that uses sev‐
eral chunks of code from this book does not require permis‐
sion. Selling or distributing examples from O’Reilly books does
require permission. Answering a question by citing this book
and quoting example code does not require permission.
Incorporating a significant amount of example code from this
Preface | xi
book into your product’s documentation does require
permission.
We appreciate, but generally do not require, attribution. An
attribution usually includes the title, author, publisher, and
ISBN. For example: “TensorFlow 2 Pocket Reference by KC Jung
(O’Reilly). Copyright 2021 Favola Vera, LLC,
978-1-492-08918-6.”
If you feel your use of code examples falls outside of fair use
or the permission given above, feel free to contact us at
permissions@oreilly.com.
O’Reilly Online Learning
For more than 40 years, O’Reilly
Media has provided technology and
business training, knowledge, and
insight to help companies succeed.
Our unique network of experts and innovators share their
knowledge and expertise through books, articles, and our
online learning platform. O’Reilly’s online learning platform
gives you on-demand access to live training courses, in-depth
learning paths, interactive coding environments, and a vast col‐
lection of text and video from O’Reilly and 200+ other publish‐
ers. For more information, visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book
to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
xii | Preface
We have a web page for this book, where we list errata, exam‐
ples, and any additional information. You can access this page
at https://oreil.ly/tensorflow2pr.
Email bookquestions@oreilly.com to comment or ask technical
questions about this book.
For news and information about our books and courses, visit
http://oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://youtube.com/oreillymedia
Acknowledgments
I really appreciate all the thoughtful and professional works by
O’Reilly editors. In addition, I also want to express my grati‐
tude to technical reviewers: Tony Holdroyd, Pablo Marin, Gior‐
gio Saez, and Axel Sirota for their valuable feedback and sug‐
gestions. Finally, a special thank to Rebecca Novack and Sarah
Grey for giving me a chance and working with me to write this
book.
Preface | xiii
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
CHAPTER 1
Introduction to TensorFlow 2
TensorFlow has long been the most popular open source
Python machine learning (ML) library. It was developed by the
Google Brain team as an internal tool, but in 2015 it was
released under an Apache License. Since then, it has evolved
into an ecosystem full of important assets for model develop‐
ment and deployment. Today it supports a wide variety of APIs
and modules that are specifically designed to handle tasks such
as data ingestion and transformation, feature engineering, and
model construction and serving, as well as many more.
TensorFlow has become increasingly complex. The purpose of
this book is to help simplify the common tasks that a data sci‐
entist or ML engineer will need to perform during an end-to-
end model development process. This book does not focus on
data science and algorithms; rather, the examples here use pre‐
built models as a vehicle to teach relevant concepts.
This book is written for readers with basic experience in and
knowledge about building ML models. Some proficiency in
Python programming is highly recommended. If you work
through the book from beginning to end, you will gain a great
deal of knowledge about the end-to-end model development
process and the major tasks involved, including data
1
engineering, ingestion, and preparation; model training; and
serving the model.
The source code for the examples in the book was developed
and tested with Google Colaboratory (Colab, for short) and a
MacBook Pro running macOS Big Sur, version 11.2.3. The
TensorFlow version used is 2.4.1, and the Python version is 3.7.
Improvements in TensorFlow 2
As TensorFlow grows, so does its complexity. The learning
curve for new TensorFlow users is steep because there are so
many different aspects to keep in mind. How do I prepare the
data for ingestion and training? How do I handle different data
types? What do I need to consider for different handling meth‐
ods? These are just some of the basic questions you may have
early in your ML journey.
A particularly difficult concept to get accustomed to is lazy exe‐
cution, which means that TensorFlow doesn’t actually process
your data until you explicitly tell it to execute the entire code.
The idea is to speed up performance. You can look at an ML
model as a set of nodes and edges (in other words, a graph).
When you run computations and transform data through the
nodes in the path, it turns out that only the computations in the
datapath are executed. In other words, you don’t have to calcu‐
late every computation, only the ones that lie directly in the
path your data takes through the graph from input through
output. If the shape and format of the data are not correctly
matched between one node and the next, when you compile
the model you will get an error. It is rather difficult to investi‐
gate where you made a mistake in passing a data structure or
tensor shape from one node to the next to debug.
Through TensorFlow 1.x, lazy execution was the way to build
and train an ML model. Starting with TensorFlow 2, however,
eager execution is the default way to build and train a model.
This change makes it much easier to debug the code and try
different model architectures. Eager execution also makes it
2 | Chapter 1: Introduction to TensorFlow 2
much easier to learn TensorFlow, in that you will see any mis‐
takes immediately upon executing each line of code. You no
longer need to build an entire graph of your model before you
can debug and test whether your input data is in the right
shape. This is one of several major features and improvements
that make TensorFlow 2 easier to use than previous versions.
Keras API
Keras, created by AI researcher François Chollet, is an open
source, high-level, deep-learning API or framework. It is com‐
patible with multiple ML libraries.
High-level implies that at a lower level there is another frame‐
work that actually executes the computation—and this is
indeed the case. These low-level frameworks include Tensor‐
Flow, Theano, and the Microsoft Cognitive Toolkit (CNTK).
The purpose of Keras is to provide easier syntax and coding
style for users who want to leverage the low-level frameworks
to build deep-learning models.
After Chollet joined Google in 2015, Keras gradually became a
keystone of TensorFlow adoption. In 2019, as the TensorFlow
team launched version 2.0, it formally adopted Keras as Ten‐
sorFlow’s first-class citizen API, known as tf.keras, for all
future releases. Since then, TensorFlow has integrated tf.keras
with many other important modules. For example, it works
seamlessly with the tf.io API for reading distributed training
data. It also works with the tf.data.Dataset class, used for
streaming training data too big to fit into a single computer.
This book uses these modules throughout all chapters.
Today TensorFlow users primarily rely on the tf.keras API for
building deep models quickly and easily. The convenience of
getting the training routine working quickly allows more time
to experiment with different model architectures and tuning
parameters in the model and training routine.
Improvements in TensorFlow 2 | 3
Reusable Models in TensorFlow
Academic researchers have built and tested many ML models,
all of which tend to be complicated in their architecture. It is
not practical for users to learn how to build these models. Enter
the idea of transfer learning, where a model developed for one
task is reused to solve another task, in this case one defined by
the user. This essentially boils down to transforming user data
into the proper data structure at model input and output.
Naturally, there has been great interest in these models and
their potential uses. Therefore, by popular demand, many
models have become available in the open source ecosystem.
TensorFlow created a repository, TensorFlow Hub, to offer the
public free access to these complicated models. If you’re inter‐
ested, you can try these models without having to build them
yourself. In Chapter 4, you will learn how to download and use
models from TensorFlow Hub. Once you do, you’ll just need to
be aware of the data structure the model expects at input, and
add a final output layer that is suitable for your prediction goal.
Every model in TensorFlow Hub contains concise documenta‐
tion that gives you the necessary information to construct your
input data.
Another place to retrieve prebuilt models is the
tf.keras.applications module, which is part of the Tensor‐
Flow distribution. In Chapter 4, you’ll learn how to use this
module to leverage a prebuilt model for your own data.
Making Commonly Used Operations Easy
All of these improvements in TensorFlow 2 make a lot of
important operations easier and more convenient to imple‐
ment. Even so, building and training an ML model end to end
is not a trivial task. This book will show you how to deal with
each aspect of the TensorFlow 2 model training process, start‐
ing from the beginning. Following are some of these
operations.
4 | Chapter 1: Introduction to TensorFlow 2
Open Source Data
A convenient package integrated into TensorFlow 2 is the
TensorFlow dataset library. It is a collection of curated open
source datasets that are readily available for use. This library
contains datasets of images, text, audio, videos, and many other
formats. Some are NumPy arrays, while others are in dataset
structures. This library also provides documentation for how to
use TensorFlow to load these datasets. By distributing a wide
variety of open source data with its product, the TensorFlow
team really saves users a lot of the trouble of searching for, inte‐
grating, and reshaping training data for a TensorFlow work‐
load. Some of the open source datasets we’ll use in this book
are the Titanic dataset for structured data classification and the
CIFAR-10 dataset for image classification.
Working with Distributed Datasets
First you have to deal with the question of how to work with
training data. Many didactic examples teach TensorFlow using
prebuilt training data in its native format, such as a small pan‐
das DataFrame or a NumPy array, which will fit nicely in your
computer’s memory. In a more realistic situation, however,
you’ll likely have to deal with much more training data than
your computer memory can handle. The size of a table read
from a SQL database can easily reach into the gigabytes. Even if
you have enough memory to load it into a pandas DataFrame
or a NumPy array, chances are your Python runtime will run
out of memory during computation and crash.
Large tables of data are typically saved as multiple files in com‐
mon formats such as CSV (comma-separated value) or text.
Because of this, you should not attempt to load each file in your
Python runtime. The correct way to deal with distributed data‐
sets is to create a reference that points to the location of all the
files. Chapter 2 will show you how to use the tf.io API, which
gives you an object that holds a list of file paths and names.
This is the preferred way to deal with training data regardless
of its size and file count.
Making Commonly Used Operations Easy | 5
Data Streaming
How do you intend to pass data to your model for training?
This is an important skill, but many popular didactic examples
approach it by passing the entire NumPy array into the model
training routine. Just like with loading large training data, you
will encounter memory issues if you try passing a large NumPy
array to your model for training.
A better way to deal with this is through data streaming.
Instead of passing the entire training data at once, you stream a
subset or batch of data for the model to train with. In Tensor‐
Flow, this is known as your dataset. In Chapter 2, you are also
going to learn how to make a dataset from the tf.io object.
Dataset objects can be made from all sorts of native data struc‐
tures. In Chapter 3, you will see how to make a tf.data.Data
set object from CSV files and images.
With the combination of tf.io and tf.data.Dataset, you’ll set
up a data handling workflow for model training without having
to read or open a single data file in your Python runtime
memory.
Data Engineering
To make meaningful features for your model to learn the pat‐
tern of, you need to apply data- or feature-engineering tasks to
your training data. Depending on the data type, there are dif‐
ferent ways to do this.
If you are working with tabular data, you may have different
values or data types in different columns. In Chapter 3, you will
see how to use TensorFlow’s feature_column API to standard‐
ize your training data. It helps you correctly mark which col‐
umns are numeric and which are categorical.
For image data, you will have different tasks. For example,
all of the images in your dataset must have the same dimen‐
sions. Further, pixel values are typically normalized or scaled to
a range of [0, 1]. For these tasks, tf.keras provides the
6 | Chapter 1: Introduction to TensorFlow 2
ImageDataGenerator class, which standardizes image sizes and
normalizes pixel values for you.
Transfer Learning
TensorFlow Hub makes prebuilt, open source models available
to everyone. In Chapter 4, you’ll learn how to use the Keras lay‐
ers API to access TensorFlow Hub. In addition, tf.keras comes
with an inventory of these prebuilt models, which can be called
using the tf.keras.applications module. In Chapter 4, you’ll
learn how to use this module for transfer learning as well.
Model Styles
There is definitely more than one way you can implement a
model using tf.keras. This is because some deep learning
model architectures or patterns are more complicated than
others. For common use, the symbolic API style, which sets up
your model architecture sequentially, is likely to suffice.
Another style is imperative API, where you declare a model as a
class, so that each time you call upon a model object, you are
creating an instance of that class. This requires you to under‐
stand how class inheritance works (I’ll discuss this in Chap‐
ter 6). If your programming background stems from an object-
oriented programming language such as C++ or Java, then this
API may have a more natural feel for you. Another reason for
using the imperative API approach is to keep your model
architecture code separate from the remaining workflow. In
Chapter 6, you will learn how to set up and use both of these
API styles.
Making Commonly Used Operations Easy | 7
Monitoring the Training Process
Monitoring how your model is trained and validated across
each epoch (that is, one pass over a training set) is an important
aspect of model training. Having a validation step at the end of
each epoch is the easiest thing you can do to guard against
model overfitting, a phenomenon in which the model starts to
memorize training data patterns rather than learning the
features as intended. In Chapter 7, you will learn how to use
various callbacks to save model weights and biases at every
epoch. I’ll also walk you through how to set up and use Tensor‐
Board to visualize the training process.
Distributed Training
Even though you know how to handle distributed data and
files and stream them into your model training routine, what
if you find that training takes an unrealistic amount of time?
This is where distributed training can help. It requires a
cluster of hardware accelerators, such as graphics processing
units (GPUs) or Tensor Processing Units (TPUs). These accel‐
erators are available through many public cloud providers. You
can also work with one GPU or TPU (not a cluster) for free in
Google Colab; you’ll learn how to use this and the
tf.distribute.MirroredStrategy class, which simplifies and
reduces the hard work of setting up distributed training, to
work through the example in the first part of Chapter 8.
Released before tf.distribute.MirroredStrategy, the Horo‐
vod API from Uber’s engineering team is a considerably more
complicated alternative. It’s specifically built to run training
routines on a computing cluster. To learn how to use Horovod,
you will need to use Databricks, a cloud-based computing plat‐
form, to work through the example in the second part of Chap‐
ter 8. This will help you learn how to refactor your code to dis‐
tribute and shard data for the Horovod API.
8 | Chapter 1: Introduction to TensorFlow 2
Serving Your TensorFlow Model
Once you’ve built your model and trained it successfully, it’s
time for you to persist, or store, the model so it can be served
to handle user input. You’ll see how easy it is to use the
tf.saved_model API to save your model.
Typically, the model is hosted by a web service. This is where
TensorFlow Serving comes into the picture: it’s a framework
that wraps your model and exposes it for web service calls via
HTTP. In Chapter 9, you will learn how to use a TensorFlow
Serving Docker image to host your model.
Improving the Training Experience
Finally, Chapter 10 discusses some important aspects of assess‐
ing and improving your model training process. You’ll learn
how to use the TensorFlow Model Analysis module to look into
the issue of model bias. This module provides an interactive
dashboard, called Fairness Indicators, designed to reveal model
bias. Using a Jupyter Notebook environment and the model
you trained on the Titanic dataset from Chapter 3, you’ll see
how Fairness Indicators works.
Another improvement brought about by the tf.keras API is
that it makes performing hyperparameter tuning more conve‐
nient. Hyperparameters are attributes related to model training
routines or model architectures. Tuning them is typically a
tedious process, as it involves thoroughly searching over the
parameter space. In Chapter 10 you’ll see how to use the Keras
Tuner library and an advanced search algorithm known as
Hyperband to conduct hyperparameter tuning work.
Wrapping Up
TensorFlow 2 is a major overhaul of the previous version. Its
most significant improvement is designating the tf.keras API
as the recommended way to use TensorFlow. This API works
seamlessly with tf.io and tf.data.Dataset for an end-to-end
Wrapping Up | 9
model training process. These improvements speed up model
building and debugging so you can experiment with other
aspects of model training, such as trying different architectures
or conducting more efficient hyperparameter searches. So, let’s
get started.
10 | Chapter 1: Introduction to TensorFlow 2
CHAPTER 2
Data Storage and Ingestion
To envision how to set up an ML model to solve a problem, you
have to start thinking about data structure patterns. In this
chapter, we’ll look at some general patterns in storage, data for‐
mats, and data ingestion. Typically, once you understand your
business problem and set it up as a data science problem, you
have to think about how to get the data into a format or struc‐
ture that your model training process can use. Data ingestion
during the training process is fundamentally a data transforma‐
tion pipeline. Without this transformation, you won’t be able to
deliver and serve the model in an enterprise-driven or use-
case-driven setting; it would remain nothing more than an
exploration tool and would not be able to scale to handle large
amounts of data.
This chapter will show you how to design a data ingestion pipe‐
line for two common data structures: tables and images. You
will learn how to make the pipeline scalable by using Tensor‐
Flow’s APIs.
Data streaming is the means by which the data is ingested in
small batches by the model for training. Data streaming in
Python is not a new concept. However, grasping it is funda‐
mental to understanding how the more advanced APIs in
TensorFlow work. Thus, this chapter will start with Python
11
generators. Then we’ll look at how tabular data is stored,
including how to indicate and track features and labels. We’ll
then move to designing your data structure, and finish by dis‐
cussing how to ingest data to your model for training and how
to stream tabular data. The rest of the chapter covers how to
organize image data for image classification and how to stream
image data.
Streaming Data with Python Generators
There are times when the Python runtime’s memory is not big
enough to handle loading the dataset in its entirety. When this
happens, the recommended practice is to load the data in small
batches. Therefore, the data is streamed into the model during
the training process.
Sending data in small batches has many other advantages as
well. One is that a gradient descent algorithm is applied to each
batch to calculate the error (that is, the difference between the
model output and the ground truth) and to gradually update
the model’s weights and biases to make this error as small as
possible. This lets us parallelize the gradient calculation, since
the error calculation (also known as loss calculation) of one
batch does not depend on the other. This is known as mini-
batch gradient descent. At the end of each epoch, after a full
training dataset has gone through the model, gradients from all
batches are summed and weights are updated. Then, training
starts again for the next epoch, with the newly updated weights
and biases, and the error is calculated. This process repeats
according to a user-defined parameter, which is known as num‐
ber of epochs for training.
A Python generator is an iterator that returns an iterable. An
example of how it works follows. Let’s start with a NumPy
library for this simple demonstration of Python generators. I’ve
created a function, my_generator, that accepts a NumPy array
and iterates two records at a time in the array:
12 | Chapter 2: Data Storage and Ingestion
import numpy as np
def my_generator(my_array):
i = 0
while True:
yield my_array[i:i+2, :] # output two elements at a time
i += 1
This is the test array I created, which will be passed into
my_generator:
test_array = np.array([[10.0, 2.0],
[15, 6.0],
[3.2, -1.5],
[-3, -2]], np.float32)
This NumPy array has four records, each consisting of two
floating-point values. Then I pass this array to my_generator:
output = my_generator(test_array)
To get output, use:
next(output)
The output should be:
array([[10., 2.],
[15., 6.]], dtype=float32)
If you run the next(output) command again, the output will be
different:
array([[15. , 6. ],
[ 3.2, -1.5]], dtype=float32)
And if you run it yet again, the output is once again different:
array([[ 3.2, -1.5],
[-3. , -2. ]], dtype=float32)
And if you run it a fourth time, the output is now:
array([[-3., -2.]], dtype=float32)
Now that the last record is shown, you have finished streaming
this data. If you run it again, it will return an empty array:
array([], shape=(0, 2), dtype=float32)
Streaming Data with Python Generators | 13
As you can see, the my_generator function streams two records
in a NumPy array each time it is run. The unique aspect of the
generator function is the use of the yield statement instead of
the return statement. Unlike return, yield produces a
sequence of values without storing the entire sequence in the
Python runtime memory. yield continues to produce a
sequence each time we invoke the next function until the end
of the array is reached.
This example demonstrates how a subset of data can be gener‐
ated via a generator function. However, in this example, the
NumPy array is created on the fly and therefore is held in the
Python runtime memory. Let’s take a look at how to iterate over
a dataset that is stored as a file.
Streaming File Content with a Generator
To understand how a file in the storage can be streamed, you
may find it easier to use a CSV file as an example. The file I use
here, the Pima Indians Diabetes Dataset, is an open source
dataset available for download. Download it and store it on
your local machine.
This file does not contain a header, so you will also need to
download the column names and descriptions for this dataset.
Briefly, the columns in this file are:
['Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness', 'Insulin', 'BMI',
'DiabetesPedigree', 'Age', 'Outcome']
Let’s look at this file with the following lines of code:
import csv
import pandas as pd
file_path = 'working_data/'
file_name = 'pima-indians-diabetes.data.csv'
col_name = ['Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness', 'Insulin', 'BMI',
'DiabetesPedigree', 'Age', 'Outcome']
pd.read_csv(file_path + file_name, names = col_name)
14 | Chapter 2: Data Storage and Ingestion
The first few rows of the file are shown in Figure 2-1.
Figure 2-1. Pima Indians Diabetes Dataset
Since we want to stream this dataset, it is more convenient to
read it as a CSV file and use the generator to output the rows,
just like we did with the NumPy array in the preceding section.
The way to do this is through the following code:
import csv
file_path = 'working_data/'
file_name = 'pima-indians-diabetes.data.csv'
with open(file_path + file_name, newline='n') as csvfile:
f = csv.reader(csvfile, delimiter=',')
for row in f:
print(','.join(row))
Let’s take a closer look at this code. We use the with open com‐
mand to create a file handle object, csvfile, that knows where
the file is stored. The next step is to pass it to the reader func‐
tion in the CSV library:
f = csv.reader(csvfile, delimiter=',')
f is the entire file in the Python runtime memory. To inspect
the file, execute this short piece of a for loop:
for row in f:
print(','.join(row))
The output of the first few rows looks like Figure 2-2.
Streaming File Content with a Generator | 15
Figure 2-2. Pima Indians Diabetes Dataset CSV output
Now that you understand how to use a file handle, let’s refactor
the preceding code so that we can use yield in a function,
effectively making a generator to stream the content of the file:
def stream_file(file_handle):
holder = []
for row in file_handle:
holder.append(row.rstrip("n"))
yield holder
holder = []
with open(file_path + file_name, newline = 'n') as handle:
for part in stream_file(handle):
print(part)
Recall that a Python generator is a function that uses yield to
iterate through an iterable object. You can use with open to
acquire a file handle as usual. Then we pass handle to a genera‐
tor function stream_file, which contains a for loop that iter‐
ates through the file in handle row by row, removes newline
code n, then fills up a holder. Each row is passed back to the
main thread’s print function by yield from the generator. The
output is shown in Figure 2-3.
Figure 2-3. Pima Indians Diabetes Dataset output by Python generator
Now that you have a clear idea of how a dataset can be
streamed, let’s look at how to apply this in TensorFlow. As it
16 | Chapter 2: Data Storage and Ingestion
turns out, TensorFlow leverages this approach to build a frame‐
work for data ingestion. Streaming is usually the best way to
ingest large amounts of data (such as hundreds of thousands of
rows in one table, or distributed across multiple tables).
JSON Data Structures
Tabular data is a common and convenient format for encoding
features and labels for ML model training, and CSV is probably
the most common tabular data format. You can think of each
field separated by the comma delimiter as a column. Each col‐
umn is defined with a data type, such as numeric (integer or
floating point) or string.
Tabular data is not the only data format that is well structured,
by which I mean that every record follows the same convention
and the order of fields in every record is the same. Another
common data structure is JSON. JSON (JavaScript Object
Notation) is a structure built with nested, hierarchical key-
value pairs. You can think of keys as column names and values
as the actual value of the data in that sample. JSON can be con‐
verted to CSV, and vice versa. Sometimes the original data is in
JSON format and it is necessary to convert it to CSV, which is
easier to display and inspect.
Here’s an example JSON record, showing the key-value pairs:
{
"id": 1,
"name": {
"first": "Dan",
"last": "Jones"
},
"rating": [
8,
7,
9
]
},
Notice that the key “rating” is associated with the value of an
array [8, 7, 9].
JSON Data Structures | 17
There are plenty of examples of using a CSV file or a table as
training data and ingesting it into the TensorFlow model train‐
ing process. Typically, the data is read into a pandas Data‐
Frame. However, this strategy only works if all the data can fit
into the Python runtime memory. You can use streaming to
handle data without the Python runtime restricting memory
allocation. Since you learned how a Python generator works in
the preceding section, you’re now ready to take a look at Ten‐
sorFlow’s API, which operates on the same principle as a
Python generator, and learn how to use TensorFlow’s adoption
of the Python generator framework.
Setting Up a Pattern for Filenames
When working with a set of files, you will encounter patterns in
file-naming conventions. To simulate an enterprise environ‐
ment where new data is continuously being generated and
stored, we will use an open source CSV file, split it into multi‐
ple parts by row count, then rename each part with a fixed pre‐
fix. This approach is similar to how the Hadoop Distributed
File System (HDFS) names the parts of a file.
Feel free to use your own CSV file if you have one handy. If not,
you can download the suggested CSV file (a COVID-19 data‐
set) for this example. (You may clone this repository if you
wish.)
For now, all you need is owid-covid-data.csv. Once it is down‐
loaded, inspect the file and determine the number of rows:
wc -l owid-covid-data.csv
The output indicates there are over 32,000 rows:
32788 owid-covid-data.csv
Next, inspect the first three lines of the CSV file to see if there is
a header:
head -3 owid-covid-data.csv
iso_code,continent,location,date,total_cases,new_cases,
total_deaths,new_deaths,total_cases_per_million,
18 | Chapter 2: Data Storage and Ingestion
new_cases_per_million,total_deaths_per_million,
new_deaths_per_million,new_tests,total_tests,
total_tests_per_thousand,new_tests_per_thousand,
new_tests_smoothed,new_tests_smoothed_per_thousand,tests_units,
stringency_index,population,population_density,median_age,
aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,
cardiovasc_death_rate,diabetes_prevalence,female_smokers,
male_smokers,handwashing_facilities,hospital_beds_per_thousand,
life_expectancy
AFG,Asia,Afghanistan,2019-12-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,,,,,,,,,38928341.0,
54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.8
Since this file contains a header, you’ll see the header in each of
the part files. You can also look at a few rows of data to see what
they actually look like.
Splitting a Single CSV File into
Multiple CSV Files
Now let’s split this file into multiple CSV files, each with 330
rows. You should end up with 100 CSV files, each of which has
the header. If you use Linux or macOS, use the following
command:
cat owid-covid-data.csv| parallel --header : --pipe -N330
'cat >owid-covid-data-
part00{#}.csv'
For macOS, you may need to first install the parallel
command:
brew install parallel
Here are some of the files that are created:
-rw-r--r-- 1 mbp16 staff 54026 Jul 26 16:45
owid-covid-data-part0096.csv
-rw-r--r-- 1 mbp16 staff 54246 Jul 26 16:45
owid-covid-data-part0097.csv
-rw-r--r-- 1 mbp16 staff 51278 Jul 26 16:45
owid-covid-data-part0098.csv
-rw-r--r-- 1 mbp16 staff 62622 Jul 26 16:45
owid-covid-data-part0099.csv
-rw-r--r-- 1 mbp16 staff 15320 Jul 26 16:45
owid-covid-data-part00100.csv
Splitting a Single CSV File into Multiple CSV Files | 19
This pattern represents a standard storage arrangement for
multiple CSV formats. There is a distinct pattern to the naming
convention: either all files have the same header, or none has
any header at all.
It’s a good idea to maintain a file-naming pattern, which can
come in handy whether you have tens or hundreds of files. And
when your naming pattern can be easily represented with wild‐
card notation, it’s easier to create a reference or file pattern
object that points to all the data in storage.
In the next section, we will look at how to use the TensorFlow
API to create a file pattern object, which we’ll use to create a
streaming object for this dataset.
Creating a File Pattern Object Using tf.io
The TensorFlow tf.io API is used for referencing a distributed
dataset that contains files with a common naming pattern. This
is not to say that you want to read the distributed dataset: what
you want is a list of file paths and names for all the dataset files
you want to read. This is not a new idea. For example, in
Python, the glob library is a popular choice for retrieving a
similar list. The tf.io API simply leverages the glob library to
generate a list of filenames that fit the pattern object:
import tensorflow as tf
base_pattern = 'dataset'
file_pattern = 'owid-covid-data-part*'
files = tf.io.gfile.glob(base_pattern + '/' + file_pattern)
files is a list that contains all the CSV filenames that are part
of the original CSV, in no particular order:
['dataset/owid-covid-data-part0091.csv',
'dataset/owid-covid-data-part0085.csv',
'dataset/owid-covid-data-part0052.csv',
'dataset/owid-covid-data-part0046.csv',
'dataset/owid-covid-data-part0047.csv',
…]
20 | Chapter 2: Data Storage and Ingestion
This list will be the input for the next step, which is to create a
streaming dataset object based on Python generators.
Creating a Streaming Dataset Object
Now that you have your file list ready, you can use it as the
input to create a streaming dataset object. Note that this code is
only meant to demonstrate how to convert a list of CSV files
into a TensorFlow dataset object. If you were really going to use
this data to train a supervised ML model, you would also per‐
form data cleansing, normalization, and aggregation, all of
which we’ll cover in Chapter 8. For the purposes of this exam‐
ple, “new_deaths” is selected as the target column:
csv_dataset = tf.data.experimental.make_csv_dataset(files,
header = True,
batch_size = 5,
label_name = 'new_deaths',
num_epochs = 1,
ignore_errors = True)
The preceding code specifies that each file in files contains a
header. For convenience, as we inspect it, we set a small batch
size of 5. We also designate a target column with label_name, as
if we are going to use this data for training a supervised ML
model. num_epochs is used to specify how many times you want
to stream over the entire dataset.
To look at actual data, you’ll need to use the csv_dataset object
to iterate through the data:
for features, target in csv_dataset.take(1):
print("'Target': {}".format(target))
print("'Features:'")
for k, v in features.items():
print(" {!r:20s}: {}".format(k, v))
This code uses the first batch of the dataset (take(1)), which
contains five samples.
Creating a Streaming Dataset Object | 21
Since you specified label_name to be the target column, the
other columns are all considered to be features. In the dataset,
contents are formatted as key-value pairs. The output from the
preceding code will be similar to this:
'Target': [ 0. 0. 16. 0. 0.]
'Features:'
'iso_code' : [b'SWZ' b'ESP' b'ECU' b'ISL' b'FRO']
'continent' :
[b'Africa' b'Europe' b'South America' b'Europe' b'Europe']
'location' :
[b'Swaziland' b'Spain' b'Ecuador' b'Iceland' b'Faeroe Islands']
'date' :
[b'2020-04-04' b'2020-02-07' b'2020-07-13' b'2020-04-01'
b'2020-06-11']
'total_cases' : [9.000e+00 1.000e+00 6.787e+04
1.135e+03 1.870e+02]
'new_cases' : [ 0. 0. 661. 49. 0.]
'total_deaths' : [0.000e+00 0.000e+00 5.047e+03
2.000e+00 0.000e+00]
'total_cases_per_million':
[7.758000e+00 2.100000e-02 3.846838e+03
3.326007e+03 3.826870e+03]
'new_cases_per_million': [ 0. 0. 37.465
143.59 0. ]
'total_deaths_per_million': [ 0. 0. 286.061
5.861 0. ]
'new_deaths_per_million':
[0. 0. 0.907 0. 0. ]
'new_tests' :
[b'' b'' b'1331.0' b'1414.0' b'']
'total_tests' :
[b'' b'' b'140602.0' b'20889.0' b'']
'total_tests_per_thousand':
[b'' b'' b'7.969' b'61.213' b'']
'new_tests_per_thousand':
[b'' b'' b'0.075' b'4.144' b'']
'new_tests_smoothed':
[b'' b'' b'1986.0' b'1188.0' b'']
'new_tests_smoothed_per_thousand':
[b'' b'' b'0.113' b'3.481' b'']
'tests_units' :
[b'' b'' b'units unclear' b'tests performed' b'']
'stringency_index' :
[89.81 11.11 82.41 53.7 0. ]
'population' :
[ 1160164. 46754784. 17643060. 341250. 48865.]
'population_density':
22 | Chapter 2: Data Storage and Ingestion
[79.492 93.105 66.939 3.404 35.308]
'median_age' :
[21.5 45.5 28.1 37.3 0. ]
'aged_65_older' :
[ 3.163 19.436 7.104 14.431 0. ]
'aged_70_older' :
[ 1.845 13.799 4.458 9.207 0. ]
'gdp_per_capita' :
[ 7738.975 34272.36 10581.936 46482.957 0. ]
'extreme_poverty' : [b'' b'1.0' b'3.6' b'0.2' b'']
'cardiovasc_death_rate':
[333.436 99.403 140.448 117.992 0. ]
'diabetes_prevalence': [3.94 7.17 5.55 5.31 0. ]
'female_smokers' :
[b'1.7' b'27.4' b'2.0' b'14.3' b'']
'male_smokers' :
[b'16.5' b'31.4' b'12.3' b'15.2' b'']
'handwashing_facilities':
[24.097 0. 80.635 0. 0. ]
'hospital_beds_per_thousand':
[2.1 2.97 1.5 2.91 0. ]
'life_expectancy' :
[60.19 83.56 77.01 82.99 80.67]
This data is retrieved during runtime (lazy execution). As indi‐
cated by the batch size, each column contains five records.
Next, let’s discuss how to stream this dataset.
Streaming a CSV Dataset
Now that a CSV dataset object has been created, you can easily
iterate over it in batches with this line of code, which uses the
iter function to make an iterator from the CSV dataset and the
next function to return the next item in the iterator:
features, label = next(iter(csv_dataset))
Remember that in this dataset there are two types of elements:
features and label. These elements are returned as a tuple
(similar to a list of objects, except that the order and the value
of objects cannot be changed or reassigned). You can unpack a
tuple by assigning the tuple elements to variables.
Streaming a CSV Dataset | 23
If you examine the label, you’ll see the content of the first
batch:
<tf.Tensor: shape=(5,), dtype=float32,
numpy=array([ 0., 0., 1., 33., 29.], dtype=float32)>
If you execute the same command again, you’ll see the second
batch:
features, label = next(iter(csv_dataset))
Let’s just take a look at label:
<tf.Tensor: shape=(5,), dtype=float32,
numpy=array([ 7., 15., 1., 0., 6.], dtype=float32)>
Indeed, this is the second batch of observations; it contains dif‐
ferent values than the first batch. This is how a streaming CSV
dataset is produced in a data ingestion pipeline. As each batch
is sent to the model for training, the model computes the pre‐
diction in the forward pass, which computes the output by mul‐
tiplying the input value and the current weight and bias in each
node of the neural network. Then it compares the prediction
with the label and calculates the loss function. Next comes the
backward pass, where the model computes the variation with
respect to the expected output and goes backward into each
node of the network to update the weight and bias. The model
then recalculates and updates the gradients. A new batch of
data is sent to the model for training, and the process repeats.
Next we will look at how to organize image data for storage and
stream it like we streamed the structured data.
Organizing Image Data
Image classification tasks require organizing images in certain
ways because, unlike CSV or tabular data, attaching a label to
an image requires special techniques. A straightforward and
common pattern for organizing image files is with the follow‐
ing directory structure:
24 | Chapter 2: Data Storage and Ingestion
<PROJECT_NAME>
train
class_1
<FILENAME>.jpg
<FILENAME>.jpg
…
class_n
<FILENAME>.jpg
<FILENAME>.jpg
…
validation
class_1
<FILENAME>.jpg
<FILENAME>.jpg
…
class_n
<FILENAME>.jpg
<FILENAME>.jpg
test
class_1
<FILENAME>.jpg
<FILENAME>.jpg
…
class_n
<FILENAME>.jpg
<FILENAME>.jpg
…
<PROJECT_NAME> is the base directory. The first level below
it contains training, validation, and test directories. Within
each of these directories, there are subdirectories named with
the image labels (class_1, class_2, etc., which in the following
example are flower types), each of which contains the raw
image files. This is shown in Figure 2-4.
This structure is common because it makes it easy to keep track
of labels and their respective images, but by no means is it the
only way to organize image data. Let’s look at another structure
for organizing images. This is very similar to the previous one,
except that training, testing, and validation are all separate.
Immediately below the <PROJECT_NAME> directory are the
directories of different image classes, as shown in Figure 2-5.
Organizing Image Data | 25
Figure 2-4. File organization for image classification and partitioning
for training work
Figure 2-5. File organization for images based on labels
Using TensorFlow Image Generator
Now let’s take a look at how to deal with images. Besides the
nuances of file organization, working with images also requires
certain steps to standardize and normalize the image files. The
model architecture requires a fixed shape (fixed dimensions)
for all images. At the pixel level, the values are normalized, typ‐
ically to a range of [0, 1] (dividing the pixel value by 255).
For this example, you’ll use an open source image set of five
different types of flowers (or feel free to use your own image
set). Let’s assume that images should be 224 × 224 pixels, where
the dimensions correspond to height and width. These are the
expected dimensions for input images if you want to use a
26 | Chapter 2: Data Storage and Ingestion
pretrained residual neural network (ResNet) as the image
classifier.
First let’s download the images. The following code downloads
five types of flowers, all in different dimensions, and puts them
in the file structure shown later in Figure 2-6:
import tensorflow as tf
data_dir = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/
example_images/flower_photos.tgz', untar=True)
We will refer to data_dir as the base directory. It should be
similar to:
'/Users/XXXXX/.keras/datasets/flower_photos'
If you list the content from the base directory, you’ll see:
-rw-r----- 1 mbp16 staff 418049 Feb 8 2016 LICENSE.txt
drwx------ 801 mbp16 staff 25632 Feb 10 2016 tulips
drwx------ 701 mbp16 staff 22432 Feb 10 2016 sunflowers
drwx------ 643 mbp16 staff 20576 Feb 10 2016 roses
drwx------ 900 mbp16 staff 28800 Feb 10 2016 dandelion
drwx------ 635 mbp16 staff 20320 Feb 10 2016 daisy
There are three steps to streaming the images. Let’s look more
closely:
1. Create an ImageDataGenerator object and specify normal‐
ization parameters. Use the rescale parameter to indicate
the normalization scale and the validation_split param‐
eter to specify that 20% of the data will be set aside for
cross validation:
train_datagen = tf.keras.preprocessing.image.
ImageDataGenerator(
rescale = 1./255,
validation_split = 0.20)
Optionally, you can wrap rescale and validation_split
as a dictionary that consists of key-value pairs:
datagen_kwargs = dict(rescale=1./255,
validation_split=0.20)
Using TensorFlow Image Generator | 27
train_datagen = tf.keras.preprocessing.image.
ImageDataGenerator(**datagen_kwargs)
This is a convenient way to reuse the same parameters and
keep multiple input arguments under wrap. (Passing the
dictionary data structure to a function is a Python
technique known as dictionary unpacking.)
2. Connect the ImageDataGenerator object to the data source
and specify parameters to resize the images to a fixed
dimension:
IMAGE_SIZE = (224, 224) # Image height and width
BATCH_SIZE = 32
dataflow_kwargs = dict(target_size=IMAGE_SIZE,
batch_size=BATCH_SIZE,
interpolation="bilinear")
train_generator = train_datagen.flow_from_directory(
data_dir, subset="training", shuffle=True,
**dataflow_kwargs)
3. Prepare a map for indexing the labels. In this step, you
retrieve the index that the generator has assigned to each
label and create a dictionary that maps it to the actual label
name. The TensorFlow generator internally keeps track of
labels from the directory name below data_dir. They can
be retrieved through train_generator.class_indices,
which returns a key-value pair of labels and indices. You
can take advantage of this and reverse it to deploy the
model for scoring. The model will output the index. To
implement this reverse lookup, simply reverse the label
dictionary generated by train_generator.class_indices:
labels_idx = (train_generator.class_indices)
idx_labels = dict((v,k) for k,v in labels_idx.items())
These are the idx_labels:
{0: 'daisy', 1: 'dandelion', 2: 'roses',
3: 'sunflowers', 4: 'tulips'}
28 | Chapter 2: Data Storage and Ingestion
Now you can inspect the shape of the items generated by
train_generator:
for image_batch, labels_batch in train_generator:
print(image_batch.shape)
print(labels_batch.shape)
break
Expect to see the following for the first batch yielded by
the generator iterating through the base directory:
(32, 224, 224, 3)
(32, 5)
The first tuple indicates a batch size of 32 images, each
with a dimension of 224 × 224 × 3 (height × width ×
depth, where depth represents the three color channels
RGB). The second tuple indicates 32 labels, each corre‐
sponding to one of the five flower types. It is one-hot
encoded per idx_labels.
Streaming Cross-Validation Images
Recall that in creating the generator for streaming training
data, you specified the validation_split parameter with a
value of 0.2. If you don’t do this, validation_split defaults to a
value of 0. If validation_split is set to a nonzero decimal,
when you invoke the flow_from_directory method, you also
have to specify subset to be either training or validation. In
the preceding example, it is subset="training".
You may be wondering how you’ll know which images belong
to the training subset from our previous endeavor of creating
a training generator. Well, you don’t have to know this if you
reassign and reuse the training generator:
valid_datagen = train_datagen
valid_generator = valid_datagen.flow_from_directory(
data_dir, subset="validation", shuffle=False,
**dataflow_kwargs)
Streaming Cross-Validation Images | 29
As you can see, a TensorFlow generator knows and keeps
track of training and validation subsets, so you can reuse the
same generator to stream over different subsets. The
dataflow_kwargs dictionary is also reused. This is a conve‐
nience feature provided by TensorFlow generators.
Because you reuse train_datagen, you can be sure that image
rescaling is done the same way as image training. And in the
valid_datagen.flow_from_directory method, you’ll pass in the
same dataflow_kwargs dictionary to set the image size for cross
validation to be the same as it is for the training images.
If you prefer to organize the images into training, validation,
and testing yourself, what you learned earlier still applies, with
two exceptions. First, your data_dir is at the level of the train‐
ing, validation, or testing directory. Second, you don’t need to
specify validation_split in ImageDataGenerator and subset in
flow_from_directory.
Inspecting Resized Images
Now let’s inspect the resized images coming off the generator.
Following is the code snippet for iterating through a batch of
data streamed by a generator:
import matplotlib.pyplot as plt
import numpy as np
image_batch, label_batch = next(iter(train_generator))
fig, axes = plt.subplots(8, 4, figsize=(10, 20))
axes = axes.flatten()
for img, lbl, ax in zip(image_batch, label_batch, axes):
ax.imshow(img)
label_ = np.argmax(lbl)
label = idx_labels[label_]
ax.set_title(label)
ax.axis('off')
plt.show()
This code will produce 32 images from the first batch coming
off the generator (see Figure 2-6).
30 | Chapter 2: Data Storage and Ingestion
Figure 2-6. A batch of reshaped images
Inspecting Resized Images | 31
Let’s examine the code:
image_batch, label_batch = next(iter(train_generator))
This iterates over the base directory with the generator. It
applies the iter function to the generator and leverages the
next function to output the image batch and label batch as
NumPy arrays:
fig, axes = plt.subplots(8, 4, figsize=(10, 20))
This line sets up the number of subplots you expect, which is
32, your batch size:
axes = axes.flatten()
for img, lbl, ax in zip(image_batch, label_batch, axes):
ax.imshow(img)
label_ = np.argmax(lbl)
label = idx_labels[label_]
ax.set_title(label)
ax.axis('off')
plt.show()
Then you set the figure axes, using a for loop to display
NumPy arrays as images and labels. As shown in Figure 2-6, all
the images are resized into 224 × 224-pixel squares. Although
the subplot holder is a rectangle with figsize=(10, 20), you
can see that all of the images are squares. This means your code
for resizing and normalizing images in the generator workflow
works as expected.
Wrapping Up
In this chapter, you learned the fundamentals of streaming data
using Python. This is a workhorse technique when working
with large, distributed datasets. You also saw some common file
organization patterns for tabular and image data.
In the section on tabular data, you learned how choosing a
good file-naming convention can make it easier to build a ref‐
erence to all the files, regardless of how many there are. This
means you now know how to build a scalable pipeline that can
32 | Chapter 2: Data Storage and Ingestion
ingest as much data as needed into a Python runtime for any
use (in this case, for TensorFlow to create a dataset).
You also learned how image files are usually organized in file
storage and how to associate images with labels. In the next
chapter, you will leverage what you’ve learned here about data
organization and streaming to integrate it with the model train‐
ing process.
Wrapping Up | 33
Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung
Discovering Diverse Content Through
Random Scribd Documents
* *
Jaj de szépen harangoznak,
Angyalosi torony alatt!
Azért huzzák olyan szépen,
Sok szép leány keseregjen.
(Háromszék: Angyalos.)
52.
SELYÖM SÁRI.
Selyöm Sári azt álmodta;
Hogy a kedvese megcsalta.
Selyöm Sári jól álmodta,
Mert őt biz a rég megcsalta.
Visszamondja erdő, berek,
Selyöm Sári ugy kesereg.
A mikor csak eszibe jut,
Könyeitől ázik az ut.
Kimegyen a zöld erdőbe,
Zöld erdőből buzaföldre,
El akarja felejteni,
Búbánatát elrejteni.
Búza földön kútat ére,
Betekint a fenekére.
Fenekibe magát látja,
Hajh, de nem ismer magára.
»Fejér Imre, értted lészen
Én halálom, kora vesztem!«
»Én Istenem verd meg értte,
A ki egygyel be nem érte.«
53.
VÁRADI JÓZSEF.
Tüzet fúj a pej paripa,
Úgy vágtat egy tiszt úr rajta.
»Hová megyen hadnagy uram?«
›Lovagolok, édes fiam!‹
»Ez sem igaz, hadnagy uram.
A ki megyen lovagolni,
Egy-két legényt viszen az csak.
Itt pedig van tizenkettő,
Ki kijedet kisérgeti.«
»Egy pajtásom bujdokol itt,
Azt indultam megkeresni.
Nagyot vétett urak ellen,
De én hozom a kegyelmet.
Nem tudod-e, merre leve?
Váradi az igaz neve.«
›Hogy ne tudnám, hogy ne tudnám!
Ott a falu felső végén
Lakik az ő violája,
Leghamarább ott találja.‹
Pásztor-legény uton marad,
Hadnagy uram tovább halad.
A kicsi ház ott fejérlik,
Abba lakik Rofajiné,
A Váradi szeretője,
Most is ott vagyon mellette.
›Szép jó estét, Rofajiné!‹
››Adjon Isten, vitéz uram!‹‹
›Udvarodon van-e vállu?
Ihatnék a kis pej lovam.‹
››Ha ihatnék, megitatom,
Ha ehetnék, abrakolom.‹‹
Megy a hadnagy a szobába,
Váradit ott megtalálja.
›Add meg magad, te gazember!‹
A katonák megrohanják,
Kezét-lábát megvasazzák.
* *
Hát ez a fa miféle fa?
Hogy egy betű sincsen rajta.
Ez a fa biz’ afféle fa:
Váradit akaszszák rajta.
* *
Ifjak, lányok, ügyeljetek,
A németnek ne higyetek!
Mer’ a német csuffá teszen,
Akasztófa alá viszen.
(Udvarhelyszék: Székely-Keresztúr.)
54.
UGYANAZ MÁS VÁLTOZATBAN.
Háromszék be van kerítve,
Váradit keritik benne.
Ha Váradit megfoghatnák,
Hóhérok kezibe adnák.
Háromszéken kiadaték,
Felvidéken béfogaték;
Három zsandár az ajtóba’
Leste, hogy karéjba fogja.
»Ne fogj engemet karéjba,
Elsétálok urfimódra.«
Váradinak szép körhaja,
Gesztenyeszin kondorhaja,
Fel van a vállára csapva.
Azért van az oda csapva,
Hogy a kötél ne surolja.
Váradinak szép körhaja,
Meszes gödör a sirhalma,
Jere Váradi, búj rea.
* *
»Ti legények, ügyeljetek,
A németnek ne higyetek,
Én ha hittem, veszem hasznát,
Viselem a gyalázatját.«
(Háromszék: Karatna.)
55.
UGYANAZ MÁS VÁLTOZATBAN.
Aj de szennyes az úr inge s gagyája,
Megszennyesült a szentgyörgyi stokházba’,
Édes rózsám, mossál engem fejérbe,
Holnap visznek az ágyitor elébe.
Ha felvisznek a törvényszék házába,
Leborulok törvényszék asztalára.
Rablánczomot keservesen zergetem,
Jaj, Istenem, most végzik az életem.
Ha kijövök a törvényszék házából,
Elbúcsuzom kerek magyar hazámtól.
Trombitások három troppot fújjanak,
Jaj Istenem, már holnap felakasztnak!
* * *
»Rofajiné én szeretőm,
Áldjon meg az én teremtőm.
Megnyughatol csókjaimtól,
Ölelgető karjaimtól.«
* *
Hát ez a fa miféle fa,
Hogy egy betü sincsen rajta?
Ez a fa biz’ afféle fa:
Váradit akaszszák rajta.
Váradinak szép kör haja,
Meszes gödör a sirhalma.
Mindez világ megengedett,
S a kapitány nem engedett.
Neki is vagyon gyermeke,
Juttassa Isten ezekre.
56.
PÁL BORISKA.
Kimenék én az utczára,
Arr’a szerencsétlen sánczra.
Hol két zsandár megtalála:
›Jöjjön Boris kontumáczra.‹
»Nem megyek én kontumáczra,
Vizsitáljon meg nagysága.«
Nagysága nem vizsitálja,
Vád alá irást csinálja.
»Arra kérem az urakot,
Ne irjanak hazugságot,
Nézzék az éfijuságot.
Mind sétálnak ők kegyesen,
Szereteikkel ékesen.
É
Én is igen keservesen,
Fejem lehajtom könnyesen.«
Zúg az erdő, törik a fa,
Pál Boriskát verik vasra,
Úgy kisérik Somolyóra;
Somolyóra, törvényházba.
– Valld bé a mit cselekedtél!
»Vallom, a mit cselekedtem:
Gyermekemet elvesztettem!«
– Tegyétek be a tömlöczbe,
Annak is a mélységibe,
Ott haljon meg keservibe.
* *
Pál Samelné édes anyám,
Mért nem jössz el egyszer hozzám?
Néznéd meg a rab leányod,
Néznéd meg a rab leányod.
A raboknak úgy van dolga
Mint a kutyának a kutba’,
Ha kinéz is az utczára,
Ott is a szél taszigálja.
(Háromszék.)
57.
VIRÁG JÁNOS.
Hej, Kolozsvár hires város,
Van kapuja kilencz záros.
Abba’ lakik egy mészáros,
Kinek neve Virág János.
Fényes kordovány csidmája,
Sárig sarkantyús a lába.
Kapum előtt összeveri,
Vig szüvemet kesergeti.
»Kapum előtt ne veregesd,
Vig szüvemet ne kesergesd.«
›Az Isten azt úgy végezte,
Hogy örökre váljak este.‹
Virág János megyen haza,
Sok rossz ember megrohanja.
Kést vernek a szive alá,
Bedobják egy bokor alá.
Madár énekel felette,
Füzfa-lapi beszél vele…
Nem siratják, nincs senkije,
Csak egy árva szereteje.
(Udvarhelyszék.)
58.
UGYANAZ MÁS VÁLTOZATBAN.
Hires város Kolozsváros,
Abba’ lakik egy mészáros,
Kinek neve Vidám János.
Sárig sarkantyús csizmáját
Összeveri, megpengeti,
Vig szivemet kesergeti.
»Kapum előtt ne veregesd,
Vig szivemet ne kesergesd.«
›Kapud előtt veregetem,
Vig szivedet kesergetem.‹
* *
Elindulék haza felé,
Gyilkosaim álltak elé.
Mig magamat észrevettem
A kezökben oda lettem.
Vérem ontották a porba
Testem vették a bokorba
Egész nyolczad napok alatt
Testem nyugodt bokor alatt
Hajlik felettem az ág is
Sirat engem a madár is.
(Csikszék.)
59.
SOLYOM ESZTI.
Ne menj Eszti Fehérvárra,
Fehérvári füstös várba,
Katonáknak van ott helye,
Leányoknak kora-veszte.
Fehérvári füstös várba’
Sok szép leány ment már kárba.
Piros orczád, szőke hajad,
Szemet szúr a tiszturaknak.
»Nem maradok megyek, megyek,
Az én dolgom, ha elveszek.
Hozzám nyuljon csak a polák,
Elfelejti Galicziát!«
Elment Eszti Fehérvárra,
Fehérvári füstös várba;
Strázsamester kerülgeti,
Lopva meg is ölelgeti.
»Strázsamester uram, kérem,
Hagyjon békét immár nékem,
Szolgálatra jöttem ide,
Nem a katonák kedvire.«
* *
Fehérvári füstös várba’
Sétál Eszti egy magába’.
Strázsamester kerülgeti,
Csókolgassa, ölelgeti.
* *
A vár alatt kis korcsoma,
Solyom Eszti lakik abba’.
Piros firhang az ablakon,
Tiszturakkal teli vagyon.
Jót mutat a tiszturaknak,
Kesereg, ha elfordulhat.
»Édes anyám, édes anyám,
Sirass engem, elvesztem már…
Hogy szót nem fogadtam egyszer,
Megbántam már ezeregyszer!«
(Csikszék.)
60.
SZŐCS MÁRIS.
Este van, este van, hatot üt az óra,
Minden szép eladó készül a fonóba.
Szegény Szőcs Máris is elindul magába,
Feje felett az ég homályba borula.
Először megütik, leesett a hidra,
Másodszor megütik, megfogott a szava,
Harmadszor megütik, véres lett a foka,
Kicsi balta foka.
A torjai nagy rét körül van sánczolva,
Annak a végibe’ fekete pántlika,
Az is a Szőcs Máris gyászos pántikája.
»Leánybarátaim, rólam tanuljatok,
Hogy irigy legénynyel ne barátkozzatok.
Ha fonóba mentek, igy lészen dolgotok,
Szeredán estére megfogyik szavatok.«
Zilahi Pistának két szál bokrétája,
Az egyik muszkáta, másik majoránna,
Az is a Szőcs Máris gyászos bokrétája,
Zilahi Pistának nem kell a bokréta,
Szegény Szőcs Márisnak ő volt a gyilkosa.
* *
»Vérem a véreddel egy patakba folyjon,
Testem a testeddel egy sirba’ nyugodjon,
Lelkem a lelkeddel mennybe’ vigadozzon!«
61.
TELEKI ÉVI.
Telekiné Évi lánya,
Mind a legényeket várja.
Hej, de mind hiába várja,
Csak nem akad neki párja.
Várja, ha jő piros hajnal…
– S megérkeznek alkonyattal.
Zsebkendejök messze kéklik,
Úgy mennek az utczán végig.
»Édes kedves szülő anyám,
Szemet vetett egy fiú rám.
Haja szőke, kék a szeme,
Palkó Pista igaz neve.«
›Édes lányom, jól ismerem,
Sokat ette a kenyerem.‹
»Édes anyám, fáj a lelkem,
Palkó Pistát megszerettem.
Ereszsz ki a kapu elé,
Mindjárt jőnek erre felé.«
›Nem eresztlek, kedves lányom,
Mit is látnál a hitványon?
Apja jobbágy, nem külömb ő,
Hitvány paraszt mindakettő.‹
Telekiné lánya, Évi,
Anyját de hijába kéri.
Fájdalmába’, keservibe’
Kimenyen a ház elibe.
Ház elibe, kerek dombra…
Ott zuhog az Olt folyója.
* *
»Édes anyám, jaj én nekem!
Pistát el nem felejthetem.
S mintsább haragodat lássam,
Legyen az Olt hitestársam!«
(Volál.)
62.
KAPITÁNYNÉ LESZEK!
»Megyek anyám, megyek,
Kapitányné leszek.«
Ne menj lányom, ne menj,
Eladlak én téged.
»Kihez anyám, kihez?«
Egy kovács legényhez.
»Nem kell anyám, nem kell,
Az a kovács legény:
Fuvónyomogatni,
Patkószeget verni…
Rip, rip, rip! rop, rop, rop
Kapitányné leszek!«
»Megyek anyám, menyek,
Kapitányné leszek.«
Ne menj lányom, ne menj,
Eladlak én téged.
»Kihez anyám, kihez?«
Egy diák legényhez.
»Nem kell anyám, nem kell,
Az a diáklegény:
Könyvet olvasgatni,
Falut tanitani…
Rip, rip, rip! stb…«
»Megyek anyám, megyek,
Kapitányné leszek.«
Ne menj lányom, ne menj,
Eladlak én téged.
»Kihez anyám, kihez?«
Egy molnár legényhez.
»Nem kell anyám, nem kell,
Az a molnár legény:
Malomkövet vágni,
Lisztet meregetni…
Rip, rip, rip! stb.«
»Megyek anyám, megyek,
Kapitányné leszek.«
Ne menj lányom, ne menj,
Eladlak én téged.
»Kihez anyám, kihez?«
Egy harangozóhoz.
»Nem kell anyám, nem kell,
Harangozó legény:
Egyet-kettőt kongat,
Mindég csak pufolgat…
Rip, rip, rip! rop, rop, rop!
Kapitányné leszek!…«
(Nagy-Baczon.)
63.
ÁRVA MÓZI.
Árva Mózi, mit gondolál,
Mikor hazól elindulál?
Én egyebet nem gondoltam,
Csak az uton felindultam.
Csög Éráni6) bekerültem,
Ott egy kupa bort kikértem.
Oda jöve Fábi fiam,
A mig a bort illogattam.
»Kérd ki apám még a társát,
Úgy fizetem meg az árát.«
›Fábi fiam, nem kérem ki,
Mert nem tudok haza menni.‹
»Haza vezetlek én osztán,
Kérd ki csak a társát, apám!…«
Csög Éránúl kiindulék,
Bükkös mellett béfordulék.
Bükkös pataknak tövibe
Piros vérem elfecscsene.
* *
Három fertály tizenegyre,
Ülj fel Fábi a gőzösre.
El is viszen Segesvárra,
Holtig tartó nagy fogságra.
* *
Szállj le Fábi a tömleczbe,
Könyökölj le a piricsre,
Holtig való testvéredre.
Lement Fábi a tömleczbe,
Lekönyökölt a piricsre.
Ki-kitekint az ablakon,
Sürü könnye hull ki azon.
64.
MOLNÁR GYURI.
Molnár Gyuri, mit gondolál,
Mikor hazól elindulál?
Én egyebet nem gondoltam:
Az halállal kezet fogtam.
Gyilkosaim körül vettek,
Puskaszóval fenyegettek.
Csudálkoztak a csillagok:
Mit csinálnak a gyilkosok.
»Hagyjátok meg életemet,
Nektek adom sok pénzemet.«
Nem kell nekünk a sok pénzed,
Kell nekünk a piros véred.
Elébb kiontjuk véredet,
Osztán elveszszük pénzedet.
* *
Piros vérem folyt a porba,
Testem veték a bokorba.
Vadak elhordák testemet,
Szánjátok bús esetemet!
(N.-Baczon.)
65.
PÉNZES MÁTÉ.
Pénzes Máté mit gondolál,
Mikor hazúl elindulál?
Nem gondoltam én egyebet:
Féltettem az életemet.
Vizaknai bánya mellett
Utólért az éj engemet.
Ellenségim ott várának,
Velem ott találkozának.
Hat katonák megtámadtak,
Kardjaikkal megszabdaltak.
Hetet vágtak a fejemre,
Kilenczet a kebelemre.
Lábom, karom, megszabdalták,
A szivemet épen hagyták.
De eljöve hű szeretőm,
Felemele onnan engem.
Apám, anyám, nyisd kapudat,
Halva hozzák a fiadat.
Szeretője vállán hozza,
Gyilkosait elátkozza.
Édes anyám, sirass engem,
Mért is kellett megszületnem!
Hogy ily átkos véget érjek,
Oh én elkárhozott lélek!
(Vizakna.)
66.
SAJGÓ SÁNDOR NÓTÁJA.
Ezernyolczszáz huszonötbe’,
Kimenék a zöld erdőbe
Ott gyilkosom megtalála,
Jajgatásom meg nem szánta.
Torkon ragadott engemet,
Elvégezé életemet.
Vérem porban elereszté,
Testem bokorba béveté.
Magas felhők induljatok,
A falumban hirt adjatok!
A falumban hirt tevének,
Harangszónál bévivének.
Jó barátim, kik valátok,
– A kik velem vigadátok, –
Mennyben van a ti atyátok,
Viseljen gondot reátok!
Édes apám Sajgó Sándor,
Könnyed hulljon, mint a zápor.
Nem gondolád azt előre,
Hogy halva hoznak elődbe.
Ó borzasztó Cseretetőn
Ártatlan vért kieresztőm!
Jónás Ferencz, a ki valál,
Boszuért boszut állottál!
67.
SZŐKE MIHÁLY.
Szabó Juli, Szőke Mihály,
Úgy szeretik egymást,
A mikor csak találkoznak,
Majd megeszik egymást.
»Hová mégy most gyönge rózsám?«
– Kérdi Mihály tőle.
›Édes kincsem, Szőke Mihály,
Megyek az erdőre.‹
Elment Julis az erdőre,
De nincs gondja fára,
Három zsandár a cziher közt
Várakozik rája.
Egyik teszi, másik veszi…
Jaj, nem veszik észre!
Szőke Sándor megérkezett
Puskával kezébe.
»Hitvány személy! életemet
Megkeseritetted!
Puska-golyó igazságot
Tegyen most feletted!…«
68.
SOROZÁSKOR.
Letörött a bécsi torony gombja,
Ihatnék a Garibáldi lova.
Szőke kis lány adjál néki vizet,
Garibáldi a csatába siet.
Garibáldi csárdás kis kalapja,
Nemzeti szin pántika van rajta.
Nemzeti szin pántika van rajta,
Garibáldi neve ragyog rajta.
Ess az eső, nagy sár van az uton,
Szőke kis lány sirva mos az Olton.
Sirva mondja az édes anyjának:
»Szeretőmet viszik katonának.«
›Ne sirj lányom, ne sirj, a faluba’
Marad legény még a te számodra.‹
»Marad anyám, marad, nem szeretem,
Gyász lesz vele az egész életem.«
Szőke kis lány kiment a kis kertbe,
Feltekintett a csillagos égre:
»Jaj Istenem, megölöm magamat!
Viszik katonának a rózsámat.«
Szőke kis lány ne ölje meg magát,
Kérdezte meg a zsandár kapitányt.
A zsandár kapitány is azt mondja:
Egyes fiú nem lehet katona.
(Kis-Baczon.)
69.
UGYANAZ MÁS VÁLTOZATBAN.
Es az eső az árpa-tarlóra,
»Jere rózsám, üljünk fel a lóra!«
›Gyenge vagyok, nem tudok felülni,
Kis pej lovad nem akar megállni.‹
Es az eső, nagy sár van az uton,
Barna kis lány sirva mos az Olton.
Sirva mondja az édes anyjának:
Szeretőjét viszik katonának.
»Ne sirj lányom, ne sirj, a faluba’
Maradt legény még a te számodra.«
›Maradt anyám, maradt, nem szeretem,
Gyászos vele az egész életem.‹
70.
ÉNEK ŐS APÁNK- ÉS ANYÁNKRÓL.
Halljatok ujságot az életben,
Ádámról s Éváról az édenben.
Egykoron megunván magát az úr,
Egy nagy darab sárból Ádámot gyúr.
Hogy Ádám is meg ne unja magát,
Azért gondolá ki az úr Évát.
Azért is az Isten unalmából
Évát szerzé görbe oldalából.
De mihelyest Ádám felhorkantott,
Éva asszony szemébe vigyorgott.
...........
»Hó, hó! megálljatok, egy fa vagyon,
A melyről ha esztek, ütlek agyon.«
Az alma szépen piroslék a fán,
És a nemi ösztön győze Éván.
Kigyó tanácsára egy párt lelopa,
S nagy kivánsággal beléharapa.
Monda Ádámnak: »Me, kóstold te is,
Igazán édes még a leve is.«
Ádám a szép szóra megkóstolá,
De torkán akadt, hogy befalá.
Ekkor mindjárt megbánák a lopást,
S jónak láták ők az elillanást.
Éva a bűn után bokorba fut,
Ádám is jettében melléje bútt.
De im jő az Úr nagy haragjába’
Tűz seprüjével paradicsomába.
Szóla: »Hol vagy Ádám? hová lettél?
A tiltott fából miért ettél?«
Ádám szepeg, de Éva szaggatja
Füge levelét s bőrére rakja.
Most monda az úr: »fáradság, fájdalom
Lészen számotokra a jutalom.«
Ezzel tüz seprüjét eléveszi,
S parancsolá paradicsomból ki.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

Similar to Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung (20)

PPTX
TensorFlow.pptx
Kavikiran3
 
PPTX
Introduction to Tensor Flow-v1.pptx
Janagi Raman S
 
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
PDF
Bringing Machine Learning to Mobile Apps with TensorFlow
Marianne Harness
 
PDF
1645 goldenberg using our laptop
Rising Media, Inc.
 
PPTX
Tensorflow Ecosystem
Vivek Raja P S
 
PDF
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
PDF
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Edureka!
 
PDF
Tensorflow - Overview, Features And Advantages.pdf
DataSpace Academy
 
PDF
1605.08695.pdf
mohammadA42
 
PDF
TensorFlow example for AI Ukraine2016
Andrii Babii
 
PDF
Neural Networks from Scratch - TensorFlow 101
Gerold Bausch
 
PPTX
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
PDF
running Tensorflow in Production
Matthias Feys
 
PPTX
Tensorflow
marwa Ayad Mohamed
 
PDF
The Flow of TensorFlow
Jeongkyu Shin
 
PDF
Lecture 4: Deep Learning Frameworks
Mohamed Loey
 
PPTX
slide-keras-tf.pptx
RithikRaj25
 
PDF
Overview of TensorFlow For Natural Language Processing
ananth
 
TensorFlow.pptx
Kavikiran3
 
Introduction to Tensor Flow-v1.pptx
Janagi Raman S
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
Bringing Machine Learning to Mobile Apps with TensorFlow
Marianne Harness
 
1645 goldenberg using our laptop
Rising Media, Inc.
 
Tensorflow Ecosystem
Vivek Raja P S
 
OpenPOWER Workshop in Silicon Valley
Ganesan Narayanasamy
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Edureka!
 
Tensorflow - Overview, Features And Advantages.pdf
DataSpace Academy
 
1605.08695.pdf
mohammadA42
 
TensorFlow example for AI Ukraine2016
Andrii Babii
 
Neural Networks from Scratch - TensorFlow 101
Gerold Bausch
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
running Tensorflow in Production
Matthias Feys
 
Tensorflow
marwa Ayad Mohamed
 
The Flow of TensorFlow
Jeongkyu Shin
 
Lecture 4: Deep Learning Frameworks
Mohamed Loey
 
slide-keras-tf.pptx
RithikRaj25
 
Overview of TensorFlow For Natural Language Processing
ananth
 

Recently uploaded (20)

PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
Dimensions of Societal Planning in Commonism
StefanMz
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
Horarios de distribución de agua en julio
pegazohn1978
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
Ad

Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung

  • 1. Tensorflow 2 Pocket Reference Building And Deploying Machine Learning Models 1st Edition Kc Tung download https://ebookbell.com/product/tensorflow-2-pocket-reference- building-and-deploying-machine-learning-models-1st-edition-kc- tung-34833530 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Tensorflow 2 Pocket Primer Oswald Campesato https://ebookbell.com/product/tensorflow-2-pocket-primer-oswald- campesato-49431688 Tensorflow 20 Pocket Primer Oswald Campesato https://ebookbell.com/product/tensorflow-20-pocket-primer-oswald- campesato-27559918 Tensorflow Pocket 1 Primer Oswald Campesato https://ebookbell.com/product/tensorflow-pocket-1-primer-oswald- campesato-47523820 Tensorflow Pocket Primer Oswald Campesato https://ebookbell.com/product/tensorflow-pocket-primer-oswald- campesato-49850274
  • 3. Python For Tensorflow Pocket Primer Oswald Campesato https://ebookbell.com/product/python-for-tensorflow-pocket-primer- oswald-campesato-47523828 Python For Tensorflow Pocket Primer Oswald Campesato Campesato https://ebookbell.com/product/python-for-tensorflow-pocket-primer- oswald-campesato-campesato-11483668 Python For Tensorflow Pocket Primer Oswald Campesato https://ebookbell.com/product/python-for-tensorflow-pocket-primer- oswald-campesato-11483670 Tensorflow 20 Computer Vision Cookbook Implement Machine Learning Solutions To Overcome Various Computer Vision Challenges Jess Martinez https://ebookbell.com/product/tensorflow-20-computer-vision-cookbook- implement-machine-learning-solutions-to-overcome-various-computer- vision-challenges-jess-martinez-23459912 Tensorflow 2 Reinforcement Learning Cookbook Over 50 Recipes To Help You Build Train And Deploy Learning Agents For Realworld Applications Praveen Palanisamy https://ebookbell.com/product/tensorflow-2-reinforcement-learning- cookbook-over-50-recipes-to-help-you-build-train-and-deploy-learning- agents-for-realworld-applications-praveen-palanisamy-35470778
  • 7. KC Tung TensorFlow 2 Pocket Reference Building and Deploying Machine Learning Models
  • 8. 978-1-492-08918-6 [LSI] TensorFlow 2 Pocket Reference by KC Tung Copyright © 2021 Favola Vera, LLC. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebasto‐ pol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promo‐ tional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Rebecca Novack Development Editor: Sarah Grey Production Editor: Beth Kelly Copyeditor: Penelope Perkins Proofreader: Audrey Doyle Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea August 2021: First Edition Revision History for the First Edition 2021-07-19: First Release See https://oreil.ly/tf2pr for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. TensorFlow 2 Pocket Reference, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages result‐ ing from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
  • 9. To my beloved wife Katy, who always supports me and sees the best in me. To my father Jerry, who raised me to pursue learning with a sense of purpose. To my hard-working and passionate readers, whose aspiration for continuous learning resonates with me and inspired me to write this book.
  • 11. Table of Contents Preface ix Chapter 1: Introduction to TensorFlow 2 1 Improvements in TensorFlow 2 2 Making Commonly Used Operations Easy 4 Wrapping Up 9 Chapter 2: Data Storage and Ingestion 11 Streaming Data with Python Generators 12 Streaming File Content with a Generator 14 JSON Data Structures 17 Setting Up a Pattern for Filenames 18 Splitting a Single CSV File into Multiple CSV Files 19 Creating a File Pattern Object Using tf.io 20 Creating a Streaming Dataset Object 21 Streaming a CSV Dataset 23 Organizing Image Data 24 Using TensorFlow Image Generator 26 v
  • 12. Streaming Cross-Validation Images 29 Inspecting Resized Images 30 Wrapping Up 32 Chapter 3: Data Preprocessing 35 Preparing Tabular Data for Training 35 Preparing Image Data for Processing 47 Preparing Text Data for Processing 56 Wrapping Up 62 Chapter 4: Reusable Model Elements 65 The Basic TensorFlow Hub Workflow 67 Image Classification by Transfer Learning 70 Using the tf.keras.applications Module for Pretrained Models 80 Wrapping Up 83 Chapter 5: Data Pipelines for Streaming Ingestion 85 Streaming Text Files with the text_dataset_from_directory Function 86 Streaming Images with a File List Using the flow_from_dataframe Method 90 Streaming a NumPy Array with the from_tensor_slices Method 97 Wrapping Up 102 Chapter 6: Model Creation Styles 103 Using the Symbolic API 104 Understanding Inheritance 114 Using the Imperative API 117 Choosing the API 120 vi | Table of Contents
  • 13. Using the Built-In Training Loop 122 Creating and Using a Custom Training Loop 123 Wrapping Up 126 Chapter 7: Monitoring the Training Process 129 Callback Objects 130 TensorBoard 140 Wrapping Up 151 Chapter 8: Distributed Training 153 Data Parallelism 154 Using the Class tf.distribute.MirroredStrategy 158 The Horovod API 169 Wrapping Up 182 Chapter 9: Serving TensorFlow Models 183 Model Serialization 184 TensorFlow Serving 193 Wrapping Up 200 Chapter 10: Improving the Modeling Experience: Fairness Evaluation and Hyperparameter Tuning 203 Model Fairness 205 Hyperparameter Tuning 217 End-to-End Hyperparameter Tuning 219 Wrapping Up 227 Index 229 Table of Contents | vii
  • 15. Preface The TensorFlow ecosystem has evolved into many different frameworks to serve a variety of roles and functions. That flexi‐ bility is part of the reason for its widespread adoption, but it also complicates the learning curve for data scientists, machine learning (ML) engineers, and other technical stakeholders. There are so many ways to manage TensorFlow models for common tasks—such as data and feature engineering, data ingestions, model selection, training patterns, cross validation against overfitting, and deployment strategies—that the choices can be overwhelming. This pocket reference will help you make choices about how to do your work with TensorFlow, including how to set up com‐ mon data science and ML workflows using TensorFlow 2.0 design patterns in Python. Examples describe and demonstrate TensorFlow coding patterns and other tasks you are likely to encounter frequently in the course of your ML project work. You can use it as both a how-to book and a reference. This book is intended for current and potential ML engineers, data scientists, and enterprise ML solution architects who want to advance their knowledge and experience in reusable patterns and best practices in TensorFlow modeling. Perhaps you’ve already read an introductory TensorFlow book, and you stay up to date with the field of data science generally. This book ix
  • 16. assumes that you have hands-on experience using Python (and possibly NumPy, pandas, and JSON libraries) for data engi‐ neering, feature engineering routines, and building TensorFlow models. Experience with common data structures such as lists, dictionaries, and NumPy arrays will also be very helpful. Unlike many other TensorFlow books, this book is structured around the tasks you’ll likely need to do, such as: • When and why should you feed training data as a NumPy array or streaming dataset? (Chapters 2 and 5) • How can you leverage a pretrained model using transfer learning? (Chapters 3 and 4) • Should you use a generic fit function to do your training or write a custom training loop? (Chapter 6) • How should you manage and make use of model check‐ points? (Chapter 7) • How can you review the training process using Tensor‐ Board? (Chapter 7) • If you can’t fit all of your data into your runtime’s memory, how can you perform distributed training using multiple accelerators, such as GPUs? (Chapter 8) • How do you pass data to your model during inferencing and how do you handle output? (Chapter 9) • Is your model fair? (Chapter 10) If you are wrestling with questions like these, this book will be helpful to you. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. x | Preface
  • 17. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, state‐ ments, and keywords. Constant width bold Shows commands or other text that should be typed liter‐ ally by the user. Constant width italic Shows text that should be replaced with user-supplied val‐ ues or by values determined by context. TIP This element signifies a tip or suggestion. Using Code Examples Supplemental material (code examples, exercises, etc.) can be downloaded at https://github.com/shinchan75034/tensorflow- pocket-ref. If you have a technical question or a problem using the code examples, please send email to [email protected]. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant por‐ tion of the code. For example, writing a program that uses sev‐ eral chunks of code from this book does not require permis‐ sion. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this Preface | xi
  • 18. book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “TensorFlow 2 Pocket Reference by KC Jung (O’Reilly). Copyright 2021 Favola Vera, LLC, 978-1-492-08918-6.” If you feel your use of code examples falls outside of fair use or the permission given above, feel free to contact us at [email protected]. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast col‐ lection of text and video from O’Reilly and 200+ other publish‐ ers. For more information, visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) xii | Preface
  • 19. We have a web page for this book, where we list errata, exam‐ ples, and any additional information. You can access this page at https://oreil.ly/tensorflow2pr. Email [email protected] to comment or ask technical questions about this book. For news and information about our books and courses, visit http://oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://youtube.com/oreillymedia Acknowledgments I really appreciate all the thoughtful and professional works by O’Reilly editors. In addition, I also want to express my grati‐ tude to technical reviewers: Tony Holdroyd, Pablo Marin, Gior‐ gio Saez, and Axel Sirota for their valuable feedback and sug‐ gestions. Finally, a special thank to Rebecca Novack and Sarah Grey for giving me a chance and working with me to write this book. Preface | xiii
  • 21. CHAPTER 1 Introduction to TensorFlow 2 TensorFlow has long been the most popular open source Python machine learning (ML) library. It was developed by the Google Brain team as an internal tool, but in 2015 it was released under an Apache License. Since then, it has evolved into an ecosystem full of important assets for model develop‐ ment and deployment. Today it supports a wide variety of APIs and modules that are specifically designed to handle tasks such as data ingestion and transformation, feature engineering, and model construction and serving, as well as many more. TensorFlow has become increasingly complex. The purpose of this book is to help simplify the common tasks that a data sci‐ entist or ML engineer will need to perform during an end-to- end model development process. This book does not focus on data science and algorithms; rather, the examples here use pre‐ built models as a vehicle to teach relevant concepts. This book is written for readers with basic experience in and knowledge about building ML models. Some proficiency in Python programming is highly recommended. If you work through the book from beginning to end, you will gain a great deal of knowledge about the end-to-end model development process and the major tasks involved, including data 1
  • 22. engineering, ingestion, and preparation; model training; and serving the model. The source code for the examples in the book was developed and tested with Google Colaboratory (Colab, for short) and a MacBook Pro running macOS Big Sur, version 11.2.3. The TensorFlow version used is 2.4.1, and the Python version is 3.7. Improvements in TensorFlow 2 As TensorFlow grows, so does its complexity. The learning curve for new TensorFlow users is steep because there are so many different aspects to keep in mind. How do I prepare the data for ingestion and training? How do I handle different data types? What do I need to consider for different handling meth‐ ods? These are just some of the basic questions you may have early in your ML journey. A particularly difficult concept to get accustomed to is lazy exe‐ cution, which means that TensorFlow doesn’t actually process your data until you explicitly tell it to execute the entire code. The idea is to speed up performance. You can look at an ML model as a set of nodes and edges (in other words, a graph). When you run computations and transform data through the nodes in the path, it turns out that only the computations in the datapath are executed. In other words, you don’t have to calcu‐ late every computation, only the ones that lie directly in the path your data takes through the graph from input through output. If the shape and format of the data are not correctly matched between one node and the next, when you compile the model you will get an error. It is rather difficult to investi‐ gate where you made a mistake in passing a data structure or tensor shape from one node to the next to debug. Through TensorFlow 1.x, lazy execution was the way to build and train an ML model. Starting with TensorFlow 2, however, eager execution is the default way to build and train a model. This change makes it much easier to debug the code and try different model architectures. Eager execution also makes it 2 | Chapter 1: Introduction to TensorFlow 2
  • 23. much easier to learn TensorFlow, in that you will see any mis‐ takes immediately upon executing each line of code. You no longer need to build an entire graph of your model before you can debug and test whether your input data is in the right shape. This is one of several major features and improvements that make TensorFlow 2 easier to use than previous versions. Keras API Keras, created by AI researcher François Chollet, is an open source, high-level, deep-learning API or framework. It is com‐ patible with multiple ML libraries. High-level implies that at a lower level there is another frame‐ work that actually executes the computation—and this is indeed the case. These low-level frameworks include Tensor‐ Flow, Theano, and the Microsoft Cognitive Toolkit (CNTK). The purpose of Keras is to provide easier syntax and coding style for users who want to leverage the low-level frameworks to build deep-learning models. After Chollet joined Google in 2015, Keras gradually became a keystone of TensorFlow adoption. In 2019, as the TensorFlow team launched version 2.0, it formally adopted Keras as Ten‐ sorFlow’s first-class citizen API, known as tf.keras, for all future releases. Since then, TensorFlow has integrated tf.keras with many other important modules. For example, it works seamlessly with the tf.io API for reading distributed training data. It also works with the tf.data.Dataset class, used for streaming training data too big to fit into a single computer. This book uses these modules throughout all chapters. Today TensorFlow users primarily rely on the tf.keras API for building deep models quickly and easily. The convenience of getting the training routine working quickly allows more time to experiment with different model architectures and tuning parameters in the model and training routine. Improvements in TensorFlow 2 | 3
  • 24. Reusable Models in TensorFlow Academic researchers have built and tested many ML models, all of which tend to be complicated in their architecture. It is not practical for users to learn how to build these models. Enter the idea of transfer learning, where a model developed for one task is reused to solve another task, in this case one defined by the user. This essentially boils down to transforming user data into the proper data structure at model input and output. Naturally, there has been great interest in these models and their potential uses. Therefore, by popular demand, many models have become available in the open source ecosystem. TensorFlow created a repository, TensorFlow Hub, to offer the public free access to these complicated models. If you’re inter‐ ested, you can try these models without having to build them yourself. In Chapter 4, you will learn how to download and use models from TensorFlow Hub. Once you do, you’ll just need to be aware of the data structure the model expects at input, and add a final output layer that is suitable for your prediction goal. Every model in TensorFlow Hub contains concise documenta‐ tion that gives you the necessary information to construct your input data. Another place to retrieve prebuilt models is the tf.keras.applications module, which is part of the Tensor‐ Flow distribution. In Chapter 4, you’ll learn how to use this module to leverage a prebuilt model for your own data. Making Commonly Used Operations Easy All of these improvements in TensorFlow 2 make a lot of important operations easier and more convenient to imple‐ ment. Even so, building and training an ML model end to end is not a trivial task. This book will show you how to deal with each aspect of the TensorFlow 2 model training process, start‐ ing from the beginning. Following are some of these operations. 4 | Chapter 1: Introduction to TensorFlow 2
  • 25. Open Source Data A convenient package integrated into TensorFlow 2 is the TensorFlow dataset library. It is a collection of curated open source datasets that are readily available for use. This library contains datasets of images, text, audio, videos, and many other formats. Some are NumPy arrays, while others are in dataset structures. This library also provides documentation for how to use TensorFlow to load these datasets. By distributing a wide variety of open source data with its product, the TensorFlow team really saves users a lot of the trouble of searching for, inte‐ grating, and reshaping training data for a TensorFlow work‐ load. Some of the open source datasets we’ll use in this book are the Titanic dataset for structured data classification and the CIFAR-10 dataset for image classification. Working with Distributed Datasets First you have to deal with the question of how to work with training data. Many didactic examples teach TensorFlow using prebuilt training data in its native format, such as a small pan‐ das DataFrame or a NumPy array, which will fit nicely in your computer’s memory. In a more realistic situation, however, you’ll likely have to deal with much more training data than your computer memory can handle. The size of a table read from a SQL database can easily reach into the gigabytes. Even if you have enough memory to load it into a pandas DataFrame or a NumPy array, chances are your Python runtime will run out of memory during computation and crash. Large tables of data are typically saved as multiple files in com‐ mon formats such as CSV (comma-separated value) or text. Because of this, you should not attempt to load each file in your Python runtime. The correct way to deal with distributed data‐ sets is to create a reference that points to the location of all the files. Chapter 2 will show you how to use the tf.io API, which gives you an object that holds a list of file paths and names. This is the preferred way to deal with training data regardless of its size and file count. Making Commonly Used Operations Easy | 5
  • 26. Data Streaming How do you intend to pass data to your model for training? This is an important skill, but many popular didactic examples approach it by passing the entire NumPy array into the model training routine. Just like with loading large training data, you will encounter memory issues if you try passing a large NumPy array to your model for training. A better way to deal with this is through data streaming. Instead of passing the entire training data at once, you stream a subset or batch of data for the model to train with. In Tensor‐ Flow, this is known as your dataset. In Chapter 2, you are also going to learn how to make a dataset from the tf.io object. Dataset objects can be made from all sorts of native data struc‐ tures. In Chapter 3, you will see how to make a tf.data.Data set object from CSV files and images. With the combination of tf.io and tf.data.Dataset, you’ll set up a data handling workflow for model training without having to read or open a single data file in your Python runtime memory. Data Engineering To make meaningful features for your model to learn the pat‐ tern of, you need to apply data- or feature-engineering tasks to your training data. Depending on the data type, there are dif‐ ferent ways to do this. If you are working with tabular data, you may have different values or data types in different columns. In Chapter 3, you will see how to use TensorFlow’s feature_column API to standard‐ ize your training data. It helps you correctly mark which col‐ umns are numeric and which are categorical. For image data, you will have different tasks. For example, all of the images in your dataset must have the same dimen‐ sions. Further, pixel values are typically normalized or scaled to a range of [0, 1]. For these tasks, tf.keras provides the 6 | Chapter 1: Introduction to TensorFlow 2
  • 27. ImageDataGenerator class, which standardizes image sizes and normalizes pixel values for you. Transfer Learning TensorFlow Hub makes prebuilt, open source models available to everyone. In Chapter 4, you’ll learn how to use the Keras lay‐ ers API to access TensorFlow Hub. In addition, tf.keras comes with an inventory of these prebuilt models, which can be called using the tf.keras.applications module. In Chapter 4, you’ll learn how to use this module for transfer learning as well. Model Styles There is definitely more than one way you can implement a model using tf.keras. This is because some deep learning model architectures or patterns are more complicated than others. For common use, the symbolic API style, which sets up your model architecture sequentially, is likely to suffice. Another style is imperative API, where you declare a model as a class, so that each time you call upon a model object, you are creating an instance of that class. This requires you to under‐ stand how class inheritance works (I’ll discuss this in Chap‐ ter 6). If your programming background stems from an object- oriented programming language such as C++ or Java, then this API may have a more natural feel for you. Another reason for using the imperative API approach is to keep your model architecture code separate from the remaining workflow. In Chapter 6, you will learn how to set up and use both of these API styles. Making Commonly Used Operations Easy | 7
  • 28. Monitoring the Training Process Monitoring how your model is trained and validated across each epoch (that is, one pass over a training set) is an important aspect of model training. Having a validation step at the end of each epoch is the easiest thing you can do to guard against model overfitting, a phenomenon in which the model starts to memorize training data patterns rather than learning the features as intended. In Chapter 7, you will learn how to use various callbacks to save model weights and biases at every epoch. I’ll also walk you through how to set up and use Tensor‐ Board to visualize the training process. Distributed Training Even though you know how to handle distributed data and files and stream them into your model training routine, what if you find that training takes an unrealistic amount of time? This is where distributed training can help. It requires a cluster of hardware accelerators, such as graphics processing units (GPUs) or Tensor Processing Units (TPUs). These accel‐ erators are available through many public cloud providers. You can also work with one GPU or TPU (not a cluster) for free in Google Colab; you’ll learn how to use this and the tf.distribute.MirroredStrategy class, which simplifies and reduces the hard work of setting up distributed training, to work through the example in the first part of Chapter 8. Released before tf.distribute.MirroredStrategy, the Horo‐ vod API from Uber’s engineering team is a considerably more complicated alternative. It’s specifically built to run training routines on a computing cluster. To learn how to use Horovod, you will need to use Databricks, a cloud-based computing plat‐ form, to work through the example in the second part of Chap‐ ter 8. This will help you learn how to refactor your code to dis‐ tribute and shard data for the Horovod API. 8 | Chapter 1: Introduction to TensorFlow 2
  • 29. Serving Your TensorFlow Model Once you’ve built your model and trained it successfully, it’s time for you to persist, or store, the model so it can be served to handle user input. You’ll see how easy it is to use the tf.saved_model API to save your model. Typically, the model is hosted by a web service. This is where TensorFlow Serving comes into the picture: it’s a framework that wraps your model and exposes it for web service calls via HTTP. In Chapter 9, you will learn how to use a TensorFlow Serving Docker image to host your model. Improving the Training Experience Finally, Chapter 10 discusses some important aspects of assess‐ ing and improving your model training process. You’ll learn how to use the TensorFlow Model Analysis module to look into the issue of model bias. This module provides an interactive dashboard, called Fairness Indicators, designed to reveal model bias. Using a Jupyter Notebook environment and the model you trained on the Titanic dataset from Chapter 3, you’ll see how Fairness Indicators works. Another improvement brought about by the tf.keras API is that it makes performing hyperparameter tuning more conve‐ nient. Hyperparameters are attributes related to model training routines or model architectures. Tuning them is typically a tedious process, as it involves thoroughly searching over the parameter space. In Chapter 10 you’ll see how to use the Keras Tuner library and an advanced search algorithm known as Hyperband to conduct hyperparameter tuning work. Wrapping Up TensorFlow 2 is a major overhaul of the previous version. Its most significant improvement is designating the tf.keras API as the recommended way to use TensorFlow. This API works seamlessly with tf.io and tf.data.Dataset for an end-to-end Wrapping Up | 9
  • 30. model training process. These improvements speed up model building and debugging so you can experiment with other aspects of model training, such as trying different architectures or conducting more efficient hyperparameter searches. So, let’s get started. 10 | Chapter 1: Introduction to TensorFlow 2
  • 31. CHAPTER 2 Data Storage and Ingestion To envision how to set up an ML model to solve a problem, you have to start thinking about data structure patterns. In this chapter, we’ll look at some general patterns in storage, data for‐ mats, and data ingestion. Typically, once you understand your business problem and set it up as a data science problem, you have to think about how to get the data into a format or struc‐ ture that your model training process can use. Data ingestion during the training process is fundamentally a data transforma‐ tion pipeline. Without this transformation, you won’t be able to deliver and serve the model in an enterprise-driven or use- case-driven setting; it would remain nothing more than an exploration tool and would not be able to scale to handle large amounts of data. This chapter will show you how to design a data ingestion pipe‐ line for two common data structures: tables and images. You will learn how to make the pipeline scalable by using Tensor‐ Flow’s APIs. Data streaming is the means by which the data is ingested in small batches by the model for training. Data streaming in Python is not a new concept. However, grasping it is funda‐ mental to understanding how the more advanced APIs in TensorFlow work. Thus, this chapter will start with Python 11
  • 32. generators. Then we’ll look at how tabular data is stored, including how to indicate and track features and labels. We’ll then move to designing your data structure, and finish by dis‐ cussing how to ingest data to your model for training and how to stream tabular data. The rest of the chapter covers how to organize image data for image classification and how to stream image data. Streaming Data with Python Generators There are times when the Python runtime’s memory is not big enough to handle loading the dataset in its entirety. When this happens, the recommended practice is to load the data in small batches. Therefore, the data is streamed into the model during the training process. Sending data in small batches has many other advantages as well. One is that a gradient descent algorithm is applied to each batch to calculate the error (that is, the difference between the model output and the ground truth) and to gradually update the model’s weights and biases to make this error as small as possible. This lets us parallelize the gradient calculation, since the error calculation (also known as loss calculation) of one batch does not depend on the other. This is known as mini- batch gradient descent. At the end of each epoch, after a full training dataset has gone through the model, gradients from all batches are summed and weights are updated. Then, training starts again for the next epoch, with the newly updated weights and biases, and the error is calculated. This process repeats according to a user-defined parameter, which is known as num‐ ber of epochs for training. A Python generator is an iterator that returns an iterable. An example of how it works follows. Let’s start with a NumPy library for this simple demonstration of Python generators. I’ve created a function, my_generator, that accepts a NumPy array and iterates two records at a time in the array: 12 | Chapter 2: Data Storage and Ingestion
  • 33. import numpy as np def my_generator(my_array): i = 0 while True: yield my_array[i:i+2, :] # output two elements at a time i += 1 This is the test array I created, which will be passed into my_generator: test_array = np.array([[10.0, 2.0], [15, 6.0], [3.2, -1.5], [-3, -2]], np.float32) This NumPy array has four records, each consisting of two floating-point values. Then I pass this array to my_generator: output = my_generator(test_array) To get output, use: next(output) The output should be: array([[10., 2.], [15., 6.]], dtype=float32) If you run the next(output) command again, the output will be different: array([[15. , 6. ], [ 3.2, -1.5]], dtype=float32) And if you run it yet again, the output is once again different: array([[ 3.2, -1.5], [-3. , -2. ]], dtype=float32) And if you run it a fourth time, the output is now: array([[-3., -2.]], dtype=float32) Now that the last record is shown, you have finished streaming this data. If you run it again, it will return an empty array: array([], shape=(0, 2), dtype=float32) Streaming Data with Python Generators | 13
  • 34. As you can see, the my_generator function streams two records in a NumPy array each time it is run. The unique aspect of the generator function is the use of the yield statement instead of the return statement. Unlike return, yield produces a sequence of values without storing the entire sequence in the Python runtime memory. yield continues to produce a sequence each time we invoke the next function until the end of the array is reached. This example demonstrates how a subset of data can be gener‐ ated via a generator function. However, in this example, the NumPy array is created on the fly and therefore is held in the Python runtime memory. Let’s take a look at how to iterate over a dataset that is stored as a file. Streaming File Content with a Generator To understand how a file in the storage can be streamed, you may find it easier to use a CSV file as an example. The file I use here, the Pima Indians Diabetes Dataset, is an open source dataset available for download. Download it and store it on your local machine. This file does not contain a header, so you will also need to download the column names and descriptions for this dataset. Briefly, the columns in this file are: ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigree', 'Age', 'Outcome'] Let’s look at this file with the following lines of code: import csv import pandas as pd file_path = 'working_data/' file_name = 'pima-indians-diabetes.data.csv' col_name = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigree', 'Age', 'Outcome'] pd.read_csv(file_path + file_name, names = col_name) 14 | Chapter 2: Data Storage and Ingestion
  • 35. The first few rows of the file are shown in Figure 2-1. Figure 2-1. Pima Indians Diabetes Dataset Since we want to stream this dataset, it is more convenient to read it as a CSV file and use the generator to output the rows, just like we did with the NumPy array in the preceding section. The way to do this is through the following code: import csv file_path = 'working_data/' file_name = 'pima-indians-diabetes.data.csv' with open(file_path + file_name, newline='n') as csvfile: f = csv.reader(csvfile, delimiter=',') for row in f: print(','.join(row)) Let’s take a closer look at this code. We use the with open com‐ mand to create a file handle object, csvfile, that knows where the file is stored. The next step is to pass it to the reader func‐ tion in the CSV library: f = csv.reader(csvfile, delimiter=',') f is the entire file in the Python runtime memory. To inspect the file, execute this short piece of a for loop: for row in f: print(','.join(row)) The output of the first few rows looks like Figure 2-2. Streaming File Content with a Generator | 15
  • 36. Figure 2-2. Pima Indians Diabetes Dataset CSV output Now that you understand how to use a file handle, let’s refactor the preceding code so that we can use yield in a function, effectively making a generator to stream the content of the file: def stream_file(file_handle): holder = [] for row in file_handle: holder.append(row.rstrip("n")) yield holder holder = [] with open(file_path + file_name, newline = 'n') as handle: for part in stream_file(handle): print(part) Recall that a Python generator is a function that uses yield to iterate through an iterable object. You can use with open to acquire a file handle as usual. Then we pass handle to a genera‐ tor function stream_file, which contains a for loop that iter‐ ates through the file in handle row by row, removes newline code n, then fills up a holder. Each row is passed back to the main thread’s print function by yield from the generator. The output is shown in Figure 2-3. Figure 2-3. Pima Indians Diabetes Dataset output by Python generator Now that you have a clear idea of how a dataset can be streamed, let’s look at how to apply this in TensorFlow. As it 16 | Chapter 2: Data Storage and Ingestion
  • 37. turns out, TensorFlow leverages this approach to build a frame‐ work for data ingestion. Streaming is usually the best way to ingest large amounts of data (such as hundreds of thousands of rows in one table, or distributed across multiple tables). JSON Data Structures Tabular data is a common and convenient format for encoding features and labels for ML model training, and CSV is probably the most common tabular data format. You can think of each field separated by the comma delimiter as a column. Each col‐ umn is defined with a data type, such as numeric (integer or floating point) or string. Tabular data is not the only data format that is well structured, by which I mean that every record follows the same convention and the order of fields in every record is the same. Another common data structure is JSON. JSON (JavaScript Object Notation) is a structure built with nested, hierarchical key- value pairs. You can think of keys as column names and values as the actual value of the data in that sample. JSON can be con‐ verted to CSV, and vice versa. Sometimes the original data is in JSON format and it is necessary to convert it to CSV, which is easier to display and inspect. Here’s an example JSON record, showing the key-value pairs: { "id": 1, "name": { "first": "Dan", "last": "Jones" }, "rating": [ 8, 7, 9 ] }, Notice that the key “rating” is associated with the value of an array [8, 7, 9]. JSON Data Structures | 17
  • 38. There are plenty of examples of using a CSV file or a table as training data and ingesting it into the TensorFlow model train‐ ing process. Typically, the data is read into a pandas Data‐ Frame. However, this strategy only works if all the data can fit into the Python runtime memory. You can use streaming to handle data without the Python runtime restricting memory allocation. Since you learned how a Python generator works in the preceding section, you’re now ready to take a look at Ten‐ sorFlow’s API, which operates on the same principle as a Python generator, and learn how to use TensorFlow’s adoption of the Python generator framework. Setting Up a Pattern for Filenames When working with a set of files, you will encounter patterns in file-naming conventions. To simulate an enterprise environ‐ ment where new data is continuously being generated and stored, we will use an open source CSV file, split it into multi‐ ple parts by row count, then rename each part with a fixed pre‐ fix. This approach is similar to how the Hadoop Distributed File System (HDFS) names the parts of a file. Feel free to use your own CSV file if you have one handy. If not, you can download the suggested CSV file (a COVID-19 data‐ set) for this example. (You may clone this repository if you wish.) For now, all you need is owid-covid-data.csv. Once it is down‐ loaded, inspect the file and determine the number of rows: wc -l owid-covid-data.csv The output indicates there are over 32,000 rows: 32788 owid-covid-data.csv Next, inspect the first three lines of the CSV file to see if there is a header: head -3 owid-covid-data.csv iso_code,continent,location,date,total_cases,new_cases, total_deaths,new_deaths,total_cases_per_million, 18 | Chapter 2: Data Storage and Ingestion
  • 39. new_cases_per_million,total_deaths_per_million, new_deaths_per_million,new_tests,total_tests, total_tests_per_thousand,new_tests_per_thousand, new_tests_smoothed,new_tests_smoothed_per_thousand,tests_units, stringency_index,population,population_density,median_age, aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty, cardiovasc_death_rate,diabetes_prevalence,female_smokers, male_smokers,handwashing_facilities,hospital_beds_per_thousand, life_expectancy AFG,Asia,Afghanistan,2019-12-31,0.0,0.0,0.0,0.0,0.0,0.0,0.0, 0.0,,,,,,,,,38928341.0, 54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.8 Since this file contains a header, you’ll see the header in each of the part files. You can also look at a few rows of data to see what they actually look like. Splitting a Single CSV File into Multiple CSV Files Now let’s split this file into multiple CSV files, each with 330 rows. You should end up with 100 CSV files, each of which has the header. If you use Linux or macOS, use the following command: cat owid-covid-data.csv| parallel --header : --pipe -N330 'cat >owid-covid-data- part00{#}.csv' For macOS, you may need to first install the parallel command: brew install parallel Here are some of the files that are created: -rw-r--r-- 1 mbp16 staff 54026 Jul 26 16:45 owid-covid-data-part0096.csv -rw-r--r-- 1 mbp16 staff 54246 Jul 26 16:45 owid-covid-data-part0097.csv -rw-r--r-- 1 mbp16 staff 51278 Jul 26 16:45 owid-covid-data-part0098.csv -rw-r--r-- 1 mbp16 staff 62622 Jul 26 16:45 owid-covid-data-part0099.csv -rw-r--r-- 1 mbp16 staff 15320 Jul 26 16:45 owid-covid-data-part00100.csv Splitting a Single CSV File into Multiple CSV Files | 19
  • 40. This pattern represents a standard storage arrangement for multiple CSV formats. There is a distinct pattern to the naming convention: either all files have the same header, or none has any header at all. It’s a good idea to maintain a file-naming pattern, which can come in handy whether you have tens or hundreds of files. And when your naming pattern can be easily represented with wild‐ card notation, it’s easier to create a reference or file pattern object that points to all the data in storage. In the next section, we will look at how to use the TensorFlow API to create a file pattern object, which we’ll use to create a streaming object for this dataset. Creating a File Pattern Object Using tf.io The TensorFlow tf.io API is used for referencing a distributed dataset that contains files with a common naming pattern. This is not to say that you want to read the distributed dataset: what you want is a list of file paths and names for all the dataset files you want to read. This is not a new idea. For example, in Python, the glob library is a popular choice for retrieving a similar list. The tf.io API simply leverages the glob library to generate a list of filenames that fit the pattern object: import tensorflow as tf base_pattern = 'dataset' file_pattern = 'owid-covid-data-part*' files = tf.io.gfile.glob(base_pattern + '/' + file_pattern) files is a list that contains all the CSV filenames that are part of the original CSV, in no particular order: ['dataset/owid-covid-data-part0091.csv', 'dataset/owid-covid-data-part0085.csv', 'dataset/owid-covid-data-part0052.csv', 'dataset/owid-covid-data-part0046.csv', 'dataset/owid-covid-data-part0047.csv', …] 20 | Chapter 2: Data Storage and Ingestion
  • 41. This list will be the input for the next step, which is to create a streaming dataset object based on Python generators. Creating a Streaming Dataset Object Now that you have your file list ready, you can use it as the input to create a streaming dataset object. Note that this code is only meant to demonstrate how to convert a list of CSV files into a TensorFlow dataset object. If you were really going to use this data to train a supervised ML model, you would also per‐ form data cleansing, normalization, and aggregation, all of which we’ll cover in Chapter 8. For the purposes of this exam‐ ple, “new_deaths” is selected as the target column: csv_dataset = tf.data.experimental.make_csv_dataset(files, header = True, batch_size = 5, label_name = 'new_deaths', num_epochs = 1, ignore_errors = True) The preceding code specifies that each file in files contains a header. For convenience, as we inspect it, we set a small batch size of 5. We also designate a target column with label_name, as if we are going to use this data for training a supervised ML model. num_epochs is used to specify how many times you want to stream over the entire dataset. To look at actual data, you’ll need to use the csv_dataset object to iterate through the data: for features, target in csv_dataset.take(1): print("'Target': {}".format(target)) print("'Features:'") for k, v in features.items(): print(" {!r:20s}: {}".format(k, v)) This code uses the first batch of the dataset (take(1)), which contains five samples. Creating a Streaming Dataset Object | 21
  • 42. Since you specified label_name to be the target column, the other columns are all considered to be features. In the dataset, contents are formatted as key-value pairs. The output from the preceding code will be similar to this: 'Target': [ 0. 0. 16. 0. 0.] 'Features:' 'iso_code' : [b'SWZ' b'ESP' b'ECU' b'ISL' b'FRO'] 'continent' : [b'Africa' b'Europe' b'South America' b'Europe' b'Europe'] 'location' : [b'Swaziland' b'Spain' b'Ecuador' b'Iceland' b'Faeroe Islands'] 'date' : [b'2020-04-04' b'2020-02-07' b'2020-07-13' b'2020-04-01' b'2020-06-11'] 'total_cases' : [9.000e+00 1.000e+00 6.787e+04 1.135e+03 1.870e+02] 'new_cases' : [ 0. 0. 661. 49. 0.] 'total_deaths' : [0.000e+00 0.000e+00 5.047e+03 2.000e+00 0.000e+00] 'total_cases_per_million': [7.758000e+00 2.100000e-02 3.846838e+03 3.326007e+03 3.826870e+03] 'new_cases_per_million': [ 0. 0. 37.465 143.59 0. ] 'total_deaths_per_million': [ 0. 0. 286.061 5.861 0. ] 'new_deaths_per_million': [0. 0. 0.907 0. 0. ] 'new_tests' : [b'' b'' b'1331.0' b'1414.0' b''] 'total_tests' : [b'' b'' b'140602.0' b'20889.0' b''] 'total_tests_per_thousand': [b'' b'' b'7.969' b'61.213' b''] 'new_tests_per_thousand': [b'' b'' b'0.075' b'4.144' b''] 'new_tests_smoothed': [b'' b'' b'1986.0' b'1188.0' b''] 'new_tests_smoothed_per_thousand': [b'' b'' b'0.113' b'3.481' b''] 'tests_units' : [b'' b'' b'units unclear' b'tests performed' b''] 'stringency_index' : [89.81 11.11 82.41 53.7 0. ] 'population' : [ 1160164. 46754784. 17643060. 341250. 48865.] 'population_density': 22 | Chapter 2: Data Storage and Ingestion
  • 43. [79.492 93.105 66.939 3.404 35.308] 'median_age' : [21.5 45.5 28.1 37.3 0. ] 'aged_65_older' : [ 3.163 19.436 7.104 14.431 0. ] 'aged_70_older' : [ 1.845 13.799 4.458 9.207 0. ] 'gdp_per_capita' : [ 7738.975 34272.36 10581.936 46482.957 0. ] 'extreme_poverty' : [b'' b'1.0' b'3.6' b'0.2' b''] 'cardiovasc_death_rate': [333.436 99.403 140.448 117.992 0. ] 'diabetes_prevalence': [3.94 7.17 5.55 5.31 0. ] 'female_smokers' : [b'1.7' b'27.4' b'2.0' b'14.3' b''] 'male_smokers' : [b'16.5' b'31.4' b'12.3' b'15.2' b''] 'handwashing_facilities': [24.097 0. 80.635 0. 0. ] 'hospital_beds_per_thousand': [2.1 2.97 1.5 2.91 0. ] 'life_expectancy' : [60.19 83.56 77.01 82.99 80.67] This data is retrieved during runtime (lazy execution). As indi‐ cated by the batch size, each column contains five records. Next, let’s discuss how to stream this dataset. Streaming a CSV Dataset Now that a CSV dataset object has been created, you can easily iterate over it in batches with this line of code, which uses the iter function to make an iterator from the CSV dataset and the next function to return the next item in the iterator: features, label = next(iter(csv_dataset)) Remember that in this dataset there are two types of elements: features and label. These elements are returned as a tuple (similar to a list of objects, except that the order and the value of objects cannot be changed or reassigned). You can unpack a tuple by assigning the tuple elements to variables. Streaming a CSV Dataset | 23
  • 44. If you examine the label, you’ll see the content of the first batch: <tf.Tensor: shape=(5,), dtype=float32, numpy=array([ 0., 0., 1., 33., 29.], dtype=float32)> If you execute the same command again, you’ll see the second batch: features, label = next(iter(csv_dataset)) Let’s just take a look at label: <tf.Tensor: shape=(5,), dtype=float32, numpy=array([ 7., 15., 1., 0., 6.], dtype=float32)> Indeed, this is the second batch of observations; it contains dif‐ ferent values than the first batch. This is how a streaming CSV dataset is produced in a data ingestion pipeline. As each batch is sent to the model for training, the model computes the pre‐ diction in the forward pass, which computes the output by mul‐ tiplying the input value and the current weight and bias in each node of the neural network. Then it compares the prediction with the label and calculates the loss function. Next comes the backward pass, where the model computes the variation with respect to the expected output and goes backward into each node of the network to update the weight and bias. The model then recalculates and updates the gradients. A new batch of data is sent to the model for training, and the process repeats. Next we will look at how to organize image data for storage and stream it like we streamed the structured data. Organizing Image Data Image classification tasks require organizing images in certain ways because, unlike CSV or tabular data, attaching a label to an image requires special techniques. A straightforward and common pattern for organizing image files is with the follow‐ ing directory structure: 24 | Chapter 2: Data Storage and Ingestion
  • 45. <PROJECT_NAME> train class_1 <FILENAME>.jpg <FILENAME>.jpg … class_n <FILENAME>.jpg <FILENAME>.jpg … validation class_1 <FILENAME>.jpg <FILENAME>.jpg … class_n <FILENAME>.jpg <FILENAME>.jpg test class_1 <FILENAME>.jpg <FILENAME>.jpg … class_n <FILENAME>.jpg <FILENAME>.jpg … <PROJECT_NAME> is the base directory. The first level below it contains training, validation, and test directories. Within each of these directories, there are subdirectories named with the image labels (class_1, class_2, etc., which in the following example are flower types), each of which contains the raw image files. This is shown in Figure 2-4. This structure is common because it makes it easy to keep track of labels and their respective images, but by no means is it the only way to organize image data. Let’s look at another structure for organizing images. This is very similar to the previous one, except that training, testing, and validation are all separate. Immediately below the <PROJECT_NAME> directory are the directories of different image classes, as shown in Figure 2-5. Organizing Image Data | 25
  • 46. Figure 2-4. File organization for image classification and partitioning for training work Figure 2-5. File organization for images based on labels Using TensorFlow Image Generator Now let’s take a look at how to deal with images. Besides the nuances of file organization, working with images also requires certain steps to standardize and normalize the image files. The model architecture requires a fixed shape (fixed dimensions) for all images. At the pixel level, the values are normalized, typ‐ ically to a range of [0, 1] (dividing the pixel value by 255). For this example, you’ll use an open source image set of five different types of flowers (or feel free to use your own image set). Let’s assume that images should be 224 × 224 pixels, where the dimensions correspond to height and width. These are the expected dimensions for input images if you want to use a 26 | Chapter 2: Data Storage and Ingestion
  • 47. pretrained residual neural network (ResNet) as the image classifier. First let’s download the images. The following code downloads five types of flowers, all in different dimensions, and puts them in the file structure shown later in Figure 2-6: import tensorflow as tf data_dir = tf.keras.utils.get_file( 'flower_photos', 'https://storage.googleapis.com/download.tensorflow.org/ example_images/flower_photos.tgz', untar=True) We will refer to data_dir as the base directory. It should be similar to: '/Users/XXXXX/.keras/datasets/flower_photos' If you list the content from the base directory, you’ll see: -rw-r----- 1 mbp16 staff 418049 Feb 8 2016 LICENSE.txt drwx------ 801 mbp16 staff 25632 Feb 10 2016 tulips drwx------ 701 mbp16 staff 22432 Feb 10 2016 sunflowers drwx------ 643 mbp16 staff 20576 Feb 10 2016 roses drwx------ 900 mbp16 staff 28800 Feb 10 2016 dandelion drwx------ 635 mbp16 staff 20320 Feb 10 2016 daisy There are three steps to streaming the images. Let’s look more closely: 1. Create an ImageDataGenerator object and specify normal‐ ization parameters. Use the rescale parameter to indicate the normalization scale and the validation_split param‐ eter to specify that 20% of the data will be set aside for cross validation: train_datagen = tf.keras.preprocessing.image. ImageDataGenerator( rescale = 1./255, validation_split = 0.20) Optionally, you can wrap rescale and validation_split as a dictionary that consists of key-value pairs: datagen_kwargs = dict(rescale=1./255, validation_split=0.20) Using TensorFlow Image Generator | 27
  • 48. train_datagen = tf.keras.preprocessing.image. ImageDataGenerator(**datagen_kwargs) This is a convenient way to reuse the same parameters and keep multiple input arguments under wrap. (Passing the dictionary data structure to a function is a Python technique known as dictionary unpacking.) 2. Connect the ImageDataGenerator object to the data source and specify parameters to resize the images to a fixed dimension: IMAGE_SIZE = (224, 224) # Image height and width BATCH_SIZE = 32 dataflow_kwargs = dict(target_size=IMAGE_SIZE, batch_size=BATCH_SIZE, interpolation="bilinear") train_generator = train_datagen.flow_from_directory( data_dir, subset="training", shuffle=True, **dataflow_kwargs) 3. Prepare a map for indexing the labels. In this step, you retrieve the index that the generator has assigned to each label and create a dictionary that maps it to the actual label name. The TensorFlow generator internally keeps track of labels from the directory name below data_dir. They can be retrieved through train_generator.class_indices, which returns a key-value pair of labels and indices. You can take advantage of this and reverse it to deploy the model for scoring. The model will output the index. To implement this reverse lookup, simply reverse the label dictionary generated by train_generator.class_indices: labels_idx = (train_generator.class_indices) idx_labels = dict((v,k) for k,v in labels_idx.items()) These are the idx_labels: {0: 'daisy', 1: 'dandelion', 2: 'roses', 3: 'sunflowers', 4: 'tulips'} 28 | Chapter 2: Data Storage and Ingestion
  • 49. Now you can inspect the shape of the items generated by train_generator: for image_batch, labels_batch in train_generator: print(image_batch.shape) print(labels_batch.shape) break Expect to see the following for the first batch yielded by the generator iterating through the base directory: (32, 224, 224, 3) (32, 5) The first tuple indicates a batch size of 32 images, each with a dimension of 224 × 224 × 3 (height × width × depth, where depth represents the three color channels RGB). The second tuple indicates 32 labels, each corre‐ sponding to one of the five flower types. It is one-hot encoded per idx_labels. Streaming Cross-Validation Images Recall that in creating the generator for streaming training data, you specified the validation_split parameter with a value of 0.2. If you don’t do this, validation_split defaults to a value of 0. If validation_split is set to a nonzero decimal, when you invoke the flow_from_directory method, you also have to specify subset to be either training or validation. In the preceding example, it is subset="training". You may be wondering how you’ll know which images belong to the training subset from our previous endeavor of creating a training generator. Well, you don’t have to know this if you reassign and reuse the training generator: valid_datagen = train_datagen valid_generator = valid_datagen.flow_from_directory( data_dir, subset="validation", shuffle=False, **dataflow_kwargs) Streaming Cross-Validation Images | 29
  • 50. As you can see, a TensorFlow generator knows and keeps track of training and validation subsets, so you can reuse the same generator to stream over different subsets. The dataflow_kwargs dictionary is also reused. This is a conve‐ nience feature provided by TensorFlow generators. Because you reuse train_datagen, you can be sure that image rescaling is done the same way as image training. And in the valid_datagen.flow_from_directory method, you’ll pass in the same dataflow_kwargs dictionary to set the image size for cross validation to be the same as it is for the training images. If you prefer to organize the images into training, validation, and testing yourself, what you learned earlier still applies, with two exceptions. First, your data_dir is at the level of the train‐ ing, validation, or testing directory. Second, you don’t need to specify validation_split in ImageDataGenerator and subset in flow_from_directory. Inspecting Resized Images Now let’s inspect the resized images coming off the generator. Following is the code snippet for iterating through a batch of data streamed by a generator: import matplotlib.pyplot as plt import numpy as np image_batch, label_batch = next(iter(train_generator)) fig, axes = plt.subplots(8, 4, figsize=(10, 20)) axes = axes.flatten() for img, lbl, ax in zip(image_batch, label_batch, axes): ax.imshow(img) label_ = np.argmax(lbl) label = idx_labels[label_] ax.set_title(label) ax.axis('off') plt.show() This code will produce 32 images from the first batch coming off the generator (see Figure 2-6). 30 | Chapter 2: Data Storage and Ingestion
  • 51. Figure 2-6. A batch of reshaped images Inspecting Resized Images | 31
  • 52. Let’s examine the code: image_batch, label_batch = next(iter(train_generator)) This iterates over the base directory with the generator. It applies the iter function to the generator and leverages the next function to output the image batch and label batch as NumPy arrays: fig, axes = plt.subplots(8, 4, figsize=(10, 20)) This line sets up the number of subplots you expect, which is 32, your batch size: axes = axes.flatten() for img, lbl, ax in zip(image_batch, label_batch, axes): ax.imshow(img) label_ = np.argmax(lbl) label = idx_labels[label_] ax.set_title(label) ax.axis('off') plt.show() Then you set the figure axes, using a for loop to display NumPy arrays as images and labels. As shown in Figure 2-6, all the images are resized into 224 × 224-pixel squares. Although the subplot holder is a rectangle with figsize=(10, 20), you can see that all of the images are squares. This means your code for resizing and normalizing images in the generator workflow works as expected. Wrapping Up In this chapter, you learned the fundamentals of streaming data using Python. This is a workhorse technique when working with large, distributed datasets. You also saw some common file organization patterns for tabular and image data. In the section on tabular data, you learned how choosing a good file-naming convention can make it easier to build a ref‐ erence to all the files, regardless of how many there are. This means you now know how to build a scalable pipeline that can 32 | Chapter 2: Data Storage and Ingestion
  • 53. ingest as much data as needed into a Python runtime for any use (in this case, for TensorFlow to create a dataset). You also learned how image files are usually organized in file storage and how to associate images with labels. In the next chapter, you will leverage what you’ve learned here about data organization and streaming to integrate it with the model train‐ ing process. Wrapping Up | 33
  • 55. Discovering Diverse Content Through Random Scribd Documents
  • 56. * * Jaj de szépen harangoznak, Angyalosi torony alatt! Azért huzzák olyan szépen, Sok szép leány keseregjen. (Háromszék: Angyalos.) 52. SELYÖM SÁRI. Selyöm Sári azt álmodta; Hogy a kedvese megcsalta. Selyöm Sári jól álmodta, Mert őt biz a rég megcsalta. Visszamondja erdő, berek, Selyöm Sári ugy kesereg. A mikor csak eszibe jut, Könyeitől ázik az ut. Kimegyen a zöld erdőbe, Zöld erdőből buzaföldre, El akarja felejteni, Búbánatát elrejteni. Búza földön kútat ére, Betekint a fenekére. Fenekibe magát látja, Hajh, de nem ismer magára. »Fejér Imre, értted lészen Én halálom, kora vesztem!« »Én Istenem verd meg értte, A ki egygyel be nem érte.«
  • 57. 53. VÁRADI JÓZSEF. Tüzet fúj a pej paripa, Úgy vágtat egy tiszt úr rajta. »Hová megyen hadnagy uram?« ›Lovagolok, édes fiam!‹ »Ez sem igaz, hadnagy uram. A ki megyen lovagolni, Egy-két legényt viszen az csak. Itt pedig van tizenkettő, Ki kijedet kisérgeti.« »Egy pajtásom bujdokol itt, Azt indultam megkeresni. Nagyot vétett urak ellen, De én hozom a kegyelmet. Nem tudod-e, merre leve? Váradi az igaz neve.« ›Hogy ne tudnám, hogy ne tudnám! Ott a falu felső végén Lakik az ő violája, Leghamarább ott találja.‹ Pásztor-legény uton marad, Hadnagy uram tovább halad. A kicsi ház ott fejérlik, Abba lakik Rofajiné, A Váradi szeretője, Most is ott vagyon mellette. ›Szép jó estét, Rofajiné!‹ ››Adjon Isten, vitéz uram!‹‹ ›Udvarodon van-e vállu? Ihatnék a kis pej lovam.‹ ››Ha ihatnék, megitatom, Ha ehetnék, abrakolom.‹‹
  • 58. Megy a hadnagy a szobába, Váradit ott megtalálja. ›Add meg magad, te gazember!‹ A katonák megrohanják, Kezét-lábát megvasazzák. * * Hát ez a fa miféle fa? Hogy egy betű sincsen rajta. Ez a fa biz’ afféle fa: Váradit akaszszák rajta. * * Ifjak, lányok, ügyeljetek, A németnek ne higyetek! Mer’ a német csuffá teszen, Akasztófa alá viszen. (Udvarhelyszék: Székely-Keresztúr.) 54. UGYANAZ MÁS VÁLTOZATBAN. Háromszék be van kerítve, Váradit keritik benne. Ha Váradit megfoghatnák, Hóhérok kezibe adnák. Háromszéken kiadaték, Felvidéken béfogaték; Három zsandár az ajtóba’ Leste, hogy karéjba fogja. »Ne fogj engemet karéjba, Elsétálok urfimódra.« Váradinak szép körhaja,
  • 59. Gesztenyeszin kondorhaja, Fel van a vállára csapva. Azért van az oda csapva, Hogy a kötél ne surolja. Váradinak szép körhaja, Meszes gödör a sirhalma, Jere Váradi, búj rea. * * »Ti legények, ügyeljetek, A németnek ne higyetek, Én ha hittem, veszem hasznát, Viselem a gyalázatját.« (Háromszék: Karatna.) 55. UGYANAZ MÁS VÁLTOZATBAN. Aj de szennyes az úr inge s gagyája, Megszennyesült a szentgyörgyi stokházba’, Édes rózsám, mossál engem fejérbe, Holnap visznek az ágyitor elébe. Ha felvisznek a törvényszék házába, Leborulok törvényszék asztalára. Rablánczomot keservesen zergetem, Jaj, Istenem, most végzik az életem. Ha kijövök a törvényszék házából, Elbúcsuzom kerek magyar hazámtól. Trombitások három troppot fújjanak, Jaj Istenem, már holnap felakasztnak! * * *
  • 60. »Rofajiné én szeretőm, Áldjon meg az én teremtőm. Megnyughatol csókjaimtól, Ölelgető karjaimtól.« * * Hát ez a fa miféle fa, Hogy egy betü sincsen rajta? Ez a fa biz’ afféle fa: Váradit akaszszák rajta. Váradinak szép kör haja, Meszes gödör a sirhalma. Mindez világ megengedett, S a kapitány nem engedett. Neki is vagyon gyermeke, Juttassa Isten ezekre. 56. PÁL BORISKA. Kimenék én az utczára, Arr’a szerencsétlen sánczra. Hol két zsandár megtalála: ›Jöjjön Boris kontumáczra.‹ »Nem megyek én kontumáczra, Vizsitáljon meg nagysága.« Nagysága nem vizsitálja, Vád alá irást csinálja. »Arra kérem az urakot, Ne irjanak hazugságot, Nézzék az éfijuságot. Mind sétálnak ők kegyesen, Szereteikkel ékesen. É
  • 61. Én is igen keservesen, Fejem lehajtom könnyesen.« Zúg az erdő, törik a fa, Pál Boriskát verik vasra, Úgy kisérik Somolyóra; Somolyóra, törvényházba. – Valld bé a mit cselekedtél! »Vallom, a mit cselekedtem: Gyermekemet elvesztettem!« – Tegyétek be a tömlöczbe, Annak is a mélységibe, Ott haljon meg keservibe. * * Pál Samelné édes anyám, Mért nem jössz el egyszer hozzám? Néznéd meg a rab leányod, Néznéd meg a rab leányod. A raboknak úgy van dolga Mint a kutyának a kutba’, Ha kinéz is az utczára, Ott is a szél taszigálja. (Háromszék.) 57. VIRÁG JÁNOS. Hej, Kolozsvár hires város, Van kapuja kilencz záros. Abba’ lakik egy mészáros, Kinek neve Virág János. Fényes kordovány csidmája, Sárig sarkantyús a lába.
  • 62. Kapum előtt összeveri, Vig szüvemet kesergeti. »Kapum előtt ne veregesd, Vig szüvemet ne kesergesd.« ›Az Isten azt úgy végezte, Hogy örökre váljak este.‹ Virág János megyen haza, Sok rossz ember megrohanja. Kést vernek a szive alá, Bedobják egy bokor alá. Madár énekel felette, Füzfa-lapi beszél vele… Nem siratják, nincs senkije, Csak egy árva szereteje. (Udvarhelyszék.) 58. UGYANAZ MÁS VÁLTOZATBAN. Hires város Kolozsváros, Abba’ lakik egy mészáros, Kinek neve Vidám János. Sárig sarkantyús csizmáját Összeveri, megpengeti, Vig szivemet kesergeti. »Kapum előtt ne veregesd, Vig szivemet ne kesergesd.« ›Kapud előtt veregetem, Vig szivedet kesergetem.‹ * * Elindulék haza felé, Gyilkosaim álltak elé.
  • 63. Mig magamat észrevettem A kezökben oda lettem. Vérem ontották a porba Testem vették a bokorba Egész nyolczad napok alatt Testem nyugodt bokor alatt Hajlik felettem az ág is Sirat engem a madár is. (Csikszék.) 59. SOLYOM ESZTI. Ne menj Eszti Fehérvárra, Fehérvári füstös várba, Katonáknak van ott helye, Leányoknak kora-veszte. Fehérvári füstös várba’ Sok szép leány ment már kárba. Piros orczád, szőke hajad, Szemet szúr a tiszturaknak. »Nem maradok megyek, megyek, Az én dolgom, ha elveszek. Hozzám nyuljon csak a polák, Elfelejti Galicziát!« Elment Eszti Fehérvárra, Fehérvári füstös várba; Strázsamester kerülgeti, Lopva meg is ölelgeti. »Strázsamester uram, kérem, Hagyjon békét immár nékem, Szolgálatra jöttem ide, Nem a katonák kedvire.«
  • 64. * * Fehérvári füstös várba’ Sétál Eszti egy magába’. Strázsamester kerülgeti, Csókolgassa, ölelgeti. * * A vár alatt kis korcsoma, Solyom Eszti lakik abba’. Piros firhang az ablakon, Tiszturakkal teli vagyon. Jót mutat a tiszturaknak, Kesereg, ha elfordulhat. »Édes anyám, édes anyám, Sirass engem, elvesztem már… Hogy szót nem fogadtam egyszer, Megbántam már ezeregyszer!« (Csikszék.) 60. SZŐCS MÁRIS. Este van, este van, hatot üt az óra, Minden szép eladó készül a fonóba. Szegény Szőcs Máris is elindul magába, Feje felett az ég homályba borula. Először megütik, leesett a hidra, Másodszor megütik, megfogott a szava, Harmadszor megütik, véres lett a foka, Kicsi balta foka. A torjai nagy rét körül van sánczolva, Annak a végibe’ fekete pántlika,
  • 65. Az is a Szőcs Máris gyászos pántikája. »Leánybarátaim, rólam tanuljatok, Hogy irigy legénynyel ne barátkozzatok. Ha fonóba mentek, igy lészen dolgotok, Szeredán estére megfogyik szavatok.« Zilahi Pistának két szál bokrétája, Az egyik muszkáta, másik majoránna, Az is a Szőcs Máris gyászos bokrétája, Zilahi Pistának nem kell a bokréta, Szegény Szőcs Márisnak ő volt a gyilkosa. * * »Vérem a véreddel egy patakba folyjon, Testem a testeddel egy sirba’ nyugodjon, Lelkem a lelkeddel mennybe’ vigadozzon!« 61. TELEKI ÉVI. Telekiné Évi lánya, Mind a legényeket várja. Hej, de mind hiába várja, Csak nem akad neki párja. Várja, ha jő piros hajnal… – S megérkeznek alkonyattal. Zsebkendejök messze kéklik, Úgy mennek az utczán végig. »Édes kedves szülő anyám, Szemet vetett egy fiú rám. Haja szőke, kék a szeme, Palkó Pista igaz neve.« ›Édes lányom, jól ismerem, Sokat ette a kenyerem.‹ »Édes anyám, fáj a lelkem,
  • 66. Palkó Pistát megszerettem. Ereszsz ki a kapu elé, Mindjárt jőnek erre felé.« ›Nem eresztlek, kedves lányom, Mit is látnál a hitványon? Apja jobbágy, nem külömb ő, Hitvány paraszt mindakettő.‹ Telekiné lánya, Évi, Anyját de hijába kéri. Fájdalmába’, keservibe’ Kimenyen a ház elibe. Ház elibe, kerek dombra… Ott zuhog az Olt folyója. * * »Édes anyám, jaj én nekem! Pistát el nem felejthetem. S mintsább haragodat lássam, Legyen az Olt hitestársam!« (Volál.) 62. KAPITÁNYNÉ LESZEK! »Megyek anyám, megyek, Kapitányné leszek.« Ne menj lányom, ne menj, Eladlak én téged. »Kihez anyám, kihez?« Egy kovács legényhez. »Nem kell anyám, nem kell, Az a kovács legény: Fuvónyomogatni,
  • 67. Patkószeget verni… Rip, rip, rip! rop, rop, rop Kapitányné leszek!« »Megyek anyám, menyek, Kapitányné leszek.« Ne menj lányom, ne menj, Eladlak én téged. »Kihez anyám, kihez?« Egy diák legényhez. »Nem kell anyám, nem kell, Az a diáklegény: Könyvet olvasgatni, Falut tanitani… Rip, rip, rip! stb…« »Megyek anyám, megyek, Kapitányné leszek.« Ne menj lányom, ne menj, Eladlak én téged. »Kihez anyám, kihez?« Egy molnár legényhez. »Nem kell anyám, nem kell, Az a molnár legény: Malomkövet vágni, Lisztet meregetni… Rip, rip, rip! stb.« »Megyek anyám, megyek, Kapitányné leszek.« Ne menj lányom, ne menj, Eladlak én téged. »Kihez anyám, kihez?« Egy harangozóhoz. »Nem kell anyám, nem kell, Harangozó legény:
  • 68. Egyet-kettőt kongat, Mindég csak pufolgat… Rip, rip, rip! rop, rop, rop! Kapitányné leszek!…« (Nagy-Baczon.) 63. ÁRVA MÓZI. Árva Mózi, mit gondolál, Mikor hazól elindulál? Én egyebet nem gondoltam, Csak az uton felindultam. Csög Éráni6) bekerültem, Ott egy kupa bort kikértem. Oda jöve Fábi fiam, A mig a bort illogattam. »Kérd ki apám még a társát, Úgy fizetem meg az árát.« ›Fábi fiam, nem kérem ki, Mert nem tudok haza menni.‹ »Haza vezetlek én osztán, Kérd ki csak a társát, apám!…« Csög Éránúl kiindulék, Bükkös mellett béfordulék. Bükkös pataknak tövibe Piros vérem elfecscsene. * * Három fertály tizenegyre, Ülj fel Fábi a gőzösre. El is viszen Segesvárra, Holtig tartó nagy fogságra.
  • 69. * * Szállj le Fábi a tömleczbe, Könyökölj le a piricsre, Holtig való testvéredre. Lement Fábi a tömleczbe, Lekönyökölt a piricsre. Ki-kitekint az ablakon, Sürü könnye hull ki azon. 64. MOLNÁR GYURI. Molnár Gyuri, mit gondolál, Mikor hazól elindulál? Én egyebet nem gondoltam: Az halállal kezet fogtam. Gyilkosaim körül vettek, Puskaszóval fenyegettek. Csudálkoztak a csillagok: Mit csinálnak a gyilkosok. »Hagyjátok meg életemet, Nektek adom sok pénzemet.« Nem kell nekünk a sok pénzed, Kell nekünk a piros véred. Elébb kiontjuk véredet, Osztán elveszszük pénzedet. * * Piros vérem folyt a porba, Testem veték a bokorba. Vadak elhordák testemet, Szánjátok bús esetemet!
  • 70. (N.-Baczon.) 65. PÉNZES MÁTÉ. Pénzes Máté mit gondolál, Mikor hazúl elindulál? Nem gondoltam én egyebet: Féltettem az életemet. Vizaknai bánya mellett Utólért az éj engemet. Ellenségim ott várának, Velem ott találkozának. Hat katonák megtámadtak, Kardjaikkal megszabdaltak. Hetet vágtak a fejemre, Kilenczet a kebelemre. Lábom, karom, megszabdalták, A szivemet épen hagyták. De eljöve hű szeretőm, Felemele onnan engem. Apám, anyám, nyisd kapudat, Halva hozzák a fiadat. Szeretője vállán hozza, Gyilkosait elátkozza. Édes anyám, sirass engem, Mért is kellett megszületnem! Hogy ily átkos véget érjek, Oh én elkárhozott lélek! (Vizakna.) 66. SAJGÓ SÁNDOR NÓTÁJA.
  • 71. Ezernyolczszáz huszonötbe’, Kimenék a zöld erdőbe Ott gyilkosom megtalála, Jajgatásom meg nem szánta. Torkon ragadott engemet, Elvégezé életemet. Vérem porban elereszté, Testem bokorba béveté. Magas felhők induljatok, A falumban hirt adjatok! A falumban hirt tevének, Harangszónál bévivének. Jó barátim, kik valátok, – A kik velem vigadátok, – Mennyben van a ti atyátok, Viseljen gondot reátok! Édes apám Sajgó Sándor, Könnyed hulljon, mint a zápor. Nem gondolád azt előre, Hogy halva hoznak elődbe. Ó borzasztó Cseretetőn Ártatlan vért kieresztőm! Jónás Ferencz, a ki valál, Boszuért boszut állottál! 67. SZŐKE MIHÁLY. Szabó Juli, Szőke Mihály, Úgy szeretik egymást, A mikor csak találkoznak, Majd megeszik egymást. »Hová mégy most gyönge rózsám?«
  • 72. – Kérdi Mihály tőle. ›Édes kincsem, Szőke Mihály, Megyek az erdőre.‹ Elment Julis az erdőre, De nincs gondja fára, Három zsandár a cziher közt Várakozik rája. Egyik teszi, másik veszi… Jaj, nem veszik észre! Szőke Sándor megérkezett Puskával kezébe. »Hitvány személy! életemet Megkeseritetted! Puska-golyó igazságot Tegyen most feletted!…« 68. SOROZÁSKOR. Letörött a bécsi torony gombja, Ihatnék a Garibáldi lova. Szőke kis lány adjál néki vizet, Garibáldi a csatába siet. Garibáldi csárdás kis kalapja, Nemzeti szin pántika van rajta. Nemzeti szin pántika van rajta, Garibáldi neve ragyog rajta. Ess az eső, nagy sár van az uton, Szőke kis lány sirva mos az Olton. Sirva mondja az édes anyjának:
  • 73. »Szeretőmet viszik katonának.« ›Ne sirj lányom, ne sirj, a faluba’ Marad legény még a te számodra.‹ »Marad anyám, marad, nem szeretem, Gyász lesz vele az egész életem.« Szőke kis lány kiment a kis kertbe, Feltekintett a csillagos égre: »Jaj Istenem, megölöm magamat! Viszik katonának a rózsámat.« Szőke kis lány ne ölje meg magát, Kérdezte meg a zsandár kapitányt. A zsandár kapitány is azt mondja: Egyes fiú nem lehet katona. (Kis-Baczon.) 69. UGYANAZ MÁS VÁLTOZATBAN. Es az eső az árpa-tarlóra, »Jere rózsám, üljünk fel a lóra!« ›Gyenge vagyok, nem tudok felülni, Kis pej lovad nem akar megállni.‹ Es az eső, nagy sár van az uton, Barna kis lány sirva mos az Olton. Sirva mondja az édes anyjának: Szeretőjét viszik katonának. »Ne sirj lányom, ne sirj, a faluba’ Maradt legény még a te számodra.« ›Maradt anyám, maradt, nem szeretem, Gyászos vele az egész életem.‹
  • 74. 70. ÉNEK ŐS APÁNK- ÉS ANYÁNKRÓL. Halljatok ujságot az életben, Ádámról s Éváról az édenben. Egykoron megunván magát az úr, Egy nagy darab sárból Ádámot gyúr. Hogy Ádám is meg ne unja magát, Azért gondolá ki az úr Évát. Azért is az Isten unalmából Évát szerzé görbe oldalából. De mihelyest Ádám felhorkantott, Éva asszony szemébe vigyorgott. ........... »Hó, hó! megálljatok, egy fa vagyon, A melyről ha esztek, ütlek agyon.« Az alma szépen piroslék a fán, És a nemi ösztön győze Éván. Kigyó tanácsára egy párt lelopa, S nagy kivánsággal beléharapa. Monda Ádámnak: »Me, kóstold te is, Igazán édes még a leve is.« Ádám a szép szóra megkóstolá, De torkán akadt, hogy befalá. Ekkor mindjárt megbánák a lopást,
  • 75. S jónak láták ők az elillanást. Éva a bűn után bokorba fut, Ádám is jettében melléje bútt. De im jő az Úr nagy haragjába’ Tűz seprüjével paradicsomába. Szóla: »Hol vagy Ádám? hová lettél? A tiltott fából miért ettél?« Ádám szepeg, de Éva szaggatja Füge levelét s bőrére rakja. Most monda az úr: »fáradság, fájdalom Lészen számotokra a jutalom.« Ezzel tüz seprüjét eléveszi, S parancsolá paradicsomból ki.
  • 76. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com