.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

Applying Machine Learning
To classify Performance Tests results
By Igor Kochetov (@k04a)
Kiev 2017

What dog are you?
.NET developer since 2007
Python developer since 2015
Toolsmith for Unity
Technologies
Religious about good code,
software design, TDD, SOLID
Love to learn new stuff
Fun Microsoft booth at NDC Oslo 2016

In this talk
❏ Applications of machine learning and most common algorithms
❏ Using machine learning to classify performance tests results in Unity
implemented in .NET
❏ How to debug machine learning algorithms

The definition of Machine Learning (ML)
Field of study that gives computers
the ability to learn without being
explicitly programmed - Arthur Samuel (1959)
A computer program is said to learn
from experience E with respect to some
class of tasks T and performance
measure P, if its performance at tasks in
T, as measured by P, improves with
experience E. - Tom Michel (1999)

Applications of Machine Learning
❏ Handwriting recognition
❏ Natural language processing (NLP)
❏ Computer vision (self-driving cars)
❏ Self customizing programs and User activities
monitoring
❏ Medical records
❏ Spam filters

Types of learning algorithms
➢ Supervised learning (labeled data)
○ Regression
○ Classification
○ Neural Networks
➢ Unsupervised learning (unlabeled data)
○ Clustering
○ Dimensionality reduction and PCA
○ Anomaly detection

What type of problem we have at hand?

Performance Tests - The problem we are solving
In Performance Tests we have:
● Around 120 runtime tests
● Around 500 native tests
● Which run nightly on 8 platforms:
iOS, Android, mac/win
editor/standalone, ps4, xbox
● Also about 25 editor tests for 2
platformsTotals of 5000 tests producing historical data points (performance of measured
component in ms) nightly across few major branches

Performance Tests - Classify into 1 of 4 categories
❏ Stable
❏ Unstable
❏ Progression
❏ Regression
200 inputs - Chronologically ordered set of samples from performance tests
4 outputs - Regression, progression, unstable, stable

Classifying MNIST dataset is the “Hello world” in ML

Activation unit modeling a neuron

Classification problem and Decision boundary
Classify input data into
one of two discrete
classes (yes/no, 1/0, etc)
Find the best “line”
separating negative and
positive examples (y = 1,
y = 0)

To better fit data we need more complex model

Every node receives its input from previous layer
(forward propagation)

How do we build and train NN?
Structure:
● Define input layer (number of input nodes)
● Define output layer (number of output nodes)
● Define hidden layer (number of nodes and layers)
Training:
● Randomize the weights and apply them to the inputs (forward propagation)
● Adjust the weights guided by output error (back propagation)
Objective:

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

How do we know we did anything good?

To access performance of the algorithm split
training data into 3 subsets
● Training set (about 60% of your data)
● Cross validation set (20%)
● Test set (20%)
Use test set to validate % of correct answers on unseen data
Use cross validation (CV) set to fine tune your algorithm, plot errors as a function
for both Training and CV sets

Learning curves or ‘do we need more data?’
Smaller sample size
usually means less error
on the training data but
more error on ‘unseen’
data
With more training data
CV error should go down,
but watch the gap
between Jcv and Jtrain
(less is better)

More complex models try to fit all training data but
tend to perform worse on ‘real’ data

Plot errors as you tweak parameters
As you increase d both training
error and cross validation error
go down as we better fit our data.
But at some point CV error starts
to go up again, since we
overfitting our training data and
failing to generalize to new
unseen data

Is your data distributed evenly?

Precision, recall and FScore
● True positive (we guessed 1, it was 1)
● False positive (we guessed 1, it was 0)
● True negative (we guessed 0, it was 0)
● False negative (we guessed 0, it was 1)
P = TP / (TP + FP)
R = TP / (TP + FN)
FScore = 2 * (P * R) / (P + R)

Mean normalization and feature scaling

In order to successfully solve machine learning
problem
● Identify task at hand and figure out suitable algorithm
● Carefully select your training (and validation and testing) data
● Normalize your data
● Validate results
● Debug your model and diagnose problem instead of randomly tweaking
parameters

References
C# version developed based on AForge.NET
https://github.com/IgorKochetov/Machine-Learning-PerfTests-Classifying
http://www.aforgenet.com/framework/docs/
http://accord-framework.net/
Stanford University course on Machine Learning by prof. Andrew Ng
https://www.coursera.org/learn/machine-learning
Book by Tariq Rashid “Make Your Own Neural Network”
https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork

How to reach me
Twitter: @k04a
Linkedin: Igor Kochetov

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

More Related Content

What's hot (20)

Similar to .NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning (20)

More from NETFest (20)

Recently uploaded (20)

.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования производительности с помощью Machine Learning

Editor's Notes