SlideShare a Scribd company logo
Machine Learning
Algorithms
WALAA HAMDY ASSY
SOFTWARE DVELOPER
Machine Learning Definition
Building a model from example inputs to make data
driven predictions vs following strictly static program
instructions.
Machine learning is also often referred to as
predictive analytics, or predictive modelling
“computer’s ability to learn without being explicitly
programmed”.
Machine learning logic
 Gathering the data
 Format it
 Pass to an algorithm
 The algorithm analyzes the data
 Create a model with a solution to solve the problem
Data Algorithm
Data
Analysis
Model
Types of Machine Learning
Algorithms
Supervised Unsupervised
Semi-
supervised
Reinforcement
SUPERVISED
Value prediction
Needs training data containing
value being predicted
Trained model predicts value in
new data
UNSUPERVISED
Identify clusters of like data
We do not have values, we try
to figure out the values
Data does not have cluster
membership
Model provides access to
data by cluster
Supervised
 In supervised learning, the machine is taught by example.
 The operator provides the machine learning algorithm with a known
dataset that includes desired inputs and outputs,
 the algorithm must find a method to determine how to arrive at
those inputs and outputs.
 the operator knows the correct answers to the problem
 the algorithm identifies patterns in data
 The algorithm makes predictions and is corrected by the operator
 this process continues until the algorithm achieves a high level of
accuracy/performance.
Machine learning Algorithms
Types of supervised learning
 Classification:
the machine learning program must draw a conclusion from observed
values , For example, when filtering emails as ‘spam’ or ‘not spam’, the
program must look at existing observational data and filter the emails
accordingly.
 Regression
the machine learning program must estimate the relationships
among variables. Regression analysis focuses on one dependent variable
and a series of other changing variables – making it particularly useful for
prediction and forecasting.
 Forecasting
the process of making predictions about the future based on the past and
present data, and is commonly used to analyse trends.
Unsupervised learning
There is no answer key or human operator to provide instruction.
Instead, the machine determines the correlations and relationships by
analysing available data.
 Clustering
involves grouping sets of similar data (based on defined criteria). It’s
useful for segmenting data into several groups and performing analysis
on each data set to find patterns.
 Dimensionality reduction
reduces the number of variables being considered to find the exact
information required.
Clusters of data, algorithm analyzes
input data and identifies groups
that share the same traits
Machine learning Algorithms
Workflow Guidelines
50% - 80% of time spent in
preparing data
Tidy datasets are easy to manipulate, model and visualize
And have a specific structure:
 Each variable is a column
 Each observation is a row
 Each observation is a row
 Each type of observational unit is a table
Hadley Wickham
Reinforcement
learning
 provided with a set of
actions, parameters and end
values. By defining the rules,
the machine learning
algorithm then tries to explore
different options and
possibilities, monitoring and
evaluating each result to
determine which one is
optimal.
 Considered an approach to AI
so it is basically out of our
scope
 Some data is labeled but most of it
is unlabeled and a mixture of
supervised and unsupervised
techniques can be used.
 it can be expensive or time-
consuming to label data as it may
require experts. Whereas unlabeled
data is cheap and easy to collect
and store.
Semi
Supervised
Machine learning Algorithms
Algorithm Decision Factors
Choosing the right machine learning algorithm depends on
several factors:
 data size
 quality and diversity
 What we want to derive
from data
 accuracy
 training time
 parameters
 Learning type
 Complexity
 Result classification or regression
 Some enable both
 Basic vs Enhanced
a combination of business need, specification,
experimentation and time available.
Algorithms
We have over 50 algorithms and more
Supervised only 28
Naïve Bayes Classifier
Algorithm (Supervised Learning )
 The Naïve Bayes classifier is based on Bayes’ theorem and classifies
every value as independent of any other value. It allows us to predict a
class/category, based on a given set of features, using probability.
 Simple and easy to understand
 Fast up to 100x faster
 Stable to data changes
 Every feature has the same weight.
Some of real world examples are :
 To mark an email as spam, or not spam ?
 Classify a news article about technology, politics, or sports ?
 Check a piece of text expressing positive emotions, or negative
emotions?
 Also used for face recognition software
How Naive Bayes algorithm works?
 Let’s understand it using an example. In the next slide a training
data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Now, we need to classify
whether players will play or not based on weather condition. Let’s
follow the below steps to perform it.
 Step 1: Convert the data set into a frequency table
 Step 2: Create Likelihood table by finding the probabilities like
 Overcast probability = 0.29 and
 probability of playing is 0.64.
Machine learning Algorithms
Applications of Naive Bayes
Algorithm
 Real time Prediction: Naive Bayes is an eager learning classifier and it is
sure fast. Thus, it could be used for making predictions in real time.
 Multi class Prediction: This algorithm is also well known for multi class
prediction feature. It predicts the probability of multiple classes of target
variable.
 Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes
classifiers mostly used in text classification .have higher success rate as
compared to other algorithms. As a result, it is widely used in Spam
filtering and Sentiment Analysis (in social media analysis, to identify
positive and negative customer sentiments)
 Recommendation System: Naive Bayes Classifier and Collaborative
Filtering together builds a Recommendation System that uses machine
learning and data mining techniques to filter unseen information and
predict whether a user would like a given resource or not
Scala Code Example
import org.apache.spark.ml.classification.NaiveBayes
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
// Load the data stored in LIBSVM format as a DataFrame.
val data = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
// Split the data into training and test sets (30% held out for testing)
val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
// Train a NaiveBayes model.
val model = new NaiveBayes()
.fit(trainingData)
// Select example rows to display.
val predictions = model.transform(testData)
predictions.show()
// Select (prediction, true label) and compute test error
val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy")
val accuracy = evaluator.evaluate(predictions)
println(s"Test set accuracy = $accuracy")
Support Vector Machine Algorithm
(Supervised Learning )
 SVM is a supervised machine learning algorithm which can be used for
classification or regression problems. It uses a technique called the
kernel trick to transform your data and then based on these
transformations it finds an optimal boundary between the possible
outputs.
 It essentially filters data into categories, which is achieved by providing
a set of training examples, each set marked as belonging to one or the
other of the two categories. The algorithm then works to build a model
that assigns new values to one category or the other.
 Also, it works by classifying the data into different classes by finding a
line.
 SVMs can not only make the reliable prediction but also can reduce
redundant information.
Applications of SVM in Real World
 Face detection – SVMc classify parts of the image as a face and non-face and create
a square boundary around the face.
 Text and hypertext categorization – SVMs allow Text and hyper text categorization .
They use training data to classify documents into different categories. It categorizes on
the basis of the score generated and then compares with the threshold value.
 Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query based
searching techniques.
 Bioinformatics – It includes protein classification and cancer classification. We use SVM
for identifying the classification of genes, patients on the basis of genes and other
biological problems.
 Handwriting recognition – We use SVMs to recognize hand written characters used
widely.
 Generalized predictive control(GPC)
Kernel tricks are used to map a non-linearly
separable functions into a higher dimension
linearly separable function.
Linear SVMKERNEL SVM
Scala Code
 import org.apache.spark.ml.classification.LinearSVC
 // Load training data
 val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
 val lsvc = new LinearSVC()
 .setMaxIter(10)
 .setRegParam(0.1)
 // Fit the model
 val lsvcModel = lsvc.fit(training)
 // Print the coefficients and intercept for linear svc
 println(s"Coefficients: ${lsvcModel.coefficients} Intercept: ${lsvcModel.intercept}")
Linear Regression - Supervised
Linear regression is the most basic type of regression.
 Simple linear regression allows us to understand
The relationships between two continuous
variables.
 If the dependent variable is not continuous
but categorical, linear regression can be
transformed to logistic regression
It envolves finding the best fit line that passes through
All the training data
Regularized regression model
 Technique Used to improve regression
 Penalize complex models , we use it to keep models simple so that it
does not over fit
 Eliminates unimportant features
 They force the model to make it simple
 They force the coefficients to go to zero if they are not significant
applications
 Time series
 Economics
 Environmental science
 Financial forecasting
 Software cost prediction, effort prediction and software quality
assurance.
 Restructuring of the budget : Organization or Country
 Predicting the crime rate of a states based on drug usage, number
of gangs, human trafficking, and Killings.
import org.apache.spark.ml.regression.LinearRegression
// Load training data
val training = spark.read.format("libsvm")
.load("data/mllib/sample_linear_regression_data.txt")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
// Summarize the model over the training set and print out some metrics
val trainingSummary = lrModel.summary
println(s"numIterations: ${trainingSummary.totalIterations}")
println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]")
trainingSummary.residuals.show()
println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")
println(s"r2: ${trainingSummary.r2}")
Logistic Regression (Supervised
learning – Classification)
 Simple but performs well in many classifcations problems
 Logistic regression focuses on estimating the probability of an event
occurring based on the previous data provided.
 It is used to cover a binary dependent variable, that is where only
two values, 0 and 1, represent outcomes.
 Confusing name meaning continuous values but the
result is binary
 Relations ships between features are weighted based
On their impact on the result
Applications of Logistic Regression
 Image Segmentation and Categorization
 Geographic Image Processing
 Handwriting recognition
 Healthcare
 Depression Prediction
 It is used in sentimental analysis like classifying good reviews from
bad ones
import org.apache.spark.ml.classification.LogisticRegression
// Load training data
val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for logistic regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
// We can also use the multinomial family for binary classification
val mlr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFamily("multinomial")
val mlrModel = mlr.fit(training)
// Print the coefficients and intercepts for logistic regression with multinomial family
println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}")
println(s"Multinomial intercepts: ${mlrModel.interceptVector}")
Decision Trees (Supervised Learning
– Classification/Regression)
 A decision tree is a decision support tool that uses a tree-
like graph or model of decisions and their possible consequences.
 It is one way to display an algorithm that only contains conditional
control statements.
 It is a binary tree of nodes , every node contains a decision so It is
nicely visualized
 Applications:
Its main purpose is classification
Ps: in MILb it can be used as a regressor or a classifier.
The only hyper parameter tuning that decision tree allow is the depth of the tree
Random Forests (Supervised Learning –
Classification/Regression) extremely
powerful technique
 an ensemble learning method, combining multiple algorithms to generate
better results for classification, regression and other tasks.
 Each individual classifier is weak, but when combined with others, can
produce excellent results.
 The algorithm starts with a ‘decision tree’ and an input is entered at the top.
It then travels down the tree, with data being segmented into smaller and
smaller sets, based on specific variables.
Machine learning Algorithms
How it works?
 Random forests are built on the basics on decision trees
 It trains many decision tress at the same time, every tree has its own
parameters on a different sample of the same dataset then combine
the output of all these decision trees
 Randomly select “K” features from total “m” features where k << m
 Among the “K” features, calculate the node “d” using the best split
point
 Split the node into daughter nodes using the best split
 Repeat the a to c steps until “l” number of nodes has been reached
 Build forest by repeating steps a to d for “n” number times to create “n”
number of trees
 Random forests performs well when individual trees that make up the
ensemble are different as possible
 They should have different trained data they should have different
parameters
Applications
 Banking
 Ecommerce
 Medicine
 Stockmarket
K Means Clustering Algorithm
(Unsupervised Learning - Clustering)
 used to categorise unlabelled data
 It works by finding groups within the data, with the number of groups
represented
 by the variable K.
 It then works iteratively to assign each data point to one of K groups
based on the features provided.
 The results of the K-means clustering algorithm are:
 The centroids of the K clusters,
 which can be used to label new data
 Labels for the training data
(each data point is assigned to a single cluster)
K-means Applications
 Behavioural segmentation
 Segment by purchase history
 Segment by activities on application, website, or platform
 Define personas based on interests
 Create profiles based on activity monitoring
 Inventory categorization:
 Group inventory by sales activity
 Group inventory by manufacturing metrics
 Sorting sensor measurements:
 Detect activity types in motion sensors
 Group images
 Separate audio
 Identify groups in health monitoring
 Detecting bots :
 Separate valid activity groups from bots
 Group valid activity to clean up outlier detection
Scala code example
import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.evaluation.ClusteringEvaluator
// Loads data.
val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
// Trains a k-means model.
val kmeans = new KMeans().setK(2).setSeed(1L)
val model = kmeans.fit(dataset)
// Make predictions
val predictions = model.transform(dataset)
// Evaluate clustering by computing Silhouette score
val evaluator = new ClusteringEvaluator()
val silhouette = evaluator.evaluate(predictions)
println(s"Silhouette with squared euclidean distance = $silhouette")
// Shows the result.
println("Cluster Centers: ")
model.clusterCenters.foreach(println)
Collaborative Filtering - Alternative
least squares
 used for recommendation systems and personalized ranking
 The DataFrame-based API for ALS currently only supports integers for
user and item id
 Not in fully functionality in ML
 It has two main implementations :
 Implicit feedback
 Asking a user to rank a collection of items from favorite to least
favorite or other things
 Explicit feedback
 Observing the items that a user views in an online store.
 Analyzing item/user viewing times.[35]
 Keeping a record of the items that a user purchases online.
References
 https://spark.apache.org/docs/latest/ml-guide.html // Ml guide -
Df based Api
 https://spark.apache.org/docs/latest/mllib-guide.html Mlib Guide -
RDD based Api
 https://archive.ics.uci.edu/ml/index.php //repository with more
than 440 datasets
 https://data-flair.training/blogs/machine-learning-algorithm/
 http://www.cubicsol.com/machine-learning-algorithms/
 https://en.wikepedia.org

More Related Content

What's hot (20)

PDF
Introduction to Machine Learning
Eng Teong Cheah
 
PPTX
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
PPTX
Machine learning
Saurabh Agrawal
 
PPTX
Machine Learning-Linear regression
kishanthkumaar
 
PPTX
Classification and Regression
Megha Sharma
 
PPTX
Introduction to ML (Machine Learning)
SwatiTripathi44
 
PPTX
Types of Machine Learning
Samra Shahzadi
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Supervised learning and Unsupervised learning
Usama Fayyaz
 
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
PPTX
Machine Learning
Girish Khanzode
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
PPTX
Deep Learning With Neural Networks
Aniket Maurya
 
PPTX
Self-organizing map
Tarat Diloksawatdikul
 
PPTX
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
PPTX
Introduction to Machine Learning
Rahul Jain
 
PPTX
Machine learning
Rajesh Chittampally
 
PDF
An introduction to Deep Learning
Julien SIMON
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
Introduction to Machine Learning
Eng Teong Cheah
 
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
Machine learning
Saurabh Agrawal
 
Machine Learning-Linear regression
kishanthkumaar
 
Classification and Regression
Megha Sharma
 
Introduction to ML (Machine Learning)
SwatiTripathi44
 
Types of Machine Learning
Samra Shahzadi
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Supervised learning and Unsupervised learning
Usama Fayyaz
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Machine Learning
Girish Khanzode
 
Introduction to Machine Learning Classifiers
Functional Imperative
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Deep Learning With Neural Networks
Aniket Maurya
 
Self-organizing map
Tarat Diloksawatdikul
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Introduction to Machine Learning
Rahul Jain
 
Machine learning
Rajesh Chittampally
 
An introduction to Deep Learning
Julien SIMON
 
Machine Learning with Decision trees
Knoldus Inc.
 

Similar to Machine learning Algorithms (20)

PPTX
INTRODUCTIONTOML2024 for graphic era.pptx
chirag19saxena2001
 
PPTX
Introduction to Machine Learning
Shahar Cohen
 
PPTX
An-Overview-of-Machine-Learning.pptx
someyamohsen3
 
PPTX
Machine Learning_PPT.pptx
RajeshBabu833061
 
PPTX
Machine learning Method and techniques
MarkMojumdar
 
PPT
Supervised and unsupervised learning
AmAn Singh
 
PDF
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Artificial Intelligence Board of America
 
PPTX
SVM - Functional Verification
Sai Kiran Kadam
 
PPTX
Intro to machine learning
Akshay Kanchan
 
PPTX
AI_06_Machine Learning.pptx
Yousef Aburawi
 
PDF
IRJET- Machine Learning: Survey, Types and Challenges
IRJET Journal
 
PPTX
Artificial Intelligence and Machine Learning
girisakthi1996
 
PPTX
Artificial Intelligence and Machine Learning
girisakthi1996
 
PPTX
Artificial Intelligence and Machine Learning
girisakthi1996
 
PPTX
Artificial Intelligence and Machine Learning
girisakthi1996
 
PPTX
Lecture 1.pptxgggggggggggggggggggggggggggggggggggggggggggg
AjayKumar773878
 
PPTX
Types of Machine Learning- Tanvir Siddike Moin
Tanvir Moin
 
PDF
Machine learning
Dr Geetha Mohan
 
PDF
50120140504015
IAEME Publication
 
PPTX
demo lecture for foundation class for btech
ROHIT738213
 
INTRODUCTIONTOML2024 for graphic era.pptx
chirag19saxena2001
 
Introduction to Machine Learning
Shahar Cohen
 
An-Overview-of-Machine-Learning.pptx
someyamohsen3
 
Machine Learning_PPT.pptx
RajeshBabu833061
 
Machine learning Method and techniques
MarkMojumdar
 
Supervised and unsupervised learning
AmAn Singh
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Artificial Intelligence Board of America
 
SVM - Functional Verification
Sai Kiran Kadam
 
Intro to machine learning
Akshay Kanchan
 
AI_06_Machine Learning.pptx
Yousef Aburawi
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET Journal
 
Artificial Intelligence and Machine Learning
girisakthi1996
 
Artificial Intelligence and Machine Learning
girisakthi1996
 
Artificial Intelligence and Machine Learning
girisakthi1996
 
Artificial Intelligence and Machine Learning
girisakthi1996
 
Lecture 1.pptxgggggggggggggggggggggggggggggggggggggggggggg
AjayKumar773878
 
Types of Machine Learning- Tanvir Siddike Moin
Tanvir Moin
 
Machine learning
Dr Geetha Mohan
 
50120140504015
IAEME Publication
 
demo lecture for foundation class for btech
ROHIT738213
 
Ad

Recently uploaded (20)

PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
Ad

Machine learning Algorithms

  • 2. Machine Learning Definition Building a model from example inputs to make data driven predictions vs following strictly static program instructions. Machine learning is also often referred to as predictive analytics, or predictive modelling “computer’s ability to learn without being explicitly programmed”.
  • 3. Machine learning logic  Gathering the data  Format it  Pass to an algorithm  The algorithm analyzes the data  Create a model with a solution to solve the problem Data Algorithm Data Analysis Model
  • 4. Types of Machine Learning Algorithms Supervised Unsupervised Semi- supervised Reinforcement
  • 5. SUPERVISED Value prediction Needs training data containing value being predicted Trained model predicts value in new data UNSUPERVISED Identify clusters of like data We do not have values, we try to figure out the values Data does not have cluster membership Model provides access to data by cluster
  • 6. Supervised  In supervised learning, the machine is taught by example.  The operator provides the machine learning algorithm with a known dataset that includes desired inputs and outputs,  the algorithm must find a method to determine how to arrive at those inputs and outputs.  the operator knows the correct answers to the problem  the algorithm identifies patterns in data  The algorithm makes predictions and is corrected by the operator  this process continues until the algorithm achieves a high level of accuracy/performance.
  • 8. Types of supervised learning  Classification: the machine learning program must draw a conclusion from observed values , For example, when filtering emails as ‘spam’ or ‘not spam’, the program must look at existing observational data and filter the emails accordingly.  Regression the machine learning program must estimate the relationships among variables. Regression analysis focuses on one dependent variable and a series of other changing variables – making it particularly useful for prediction and forecasting.  Forecasting the process of making predictions about the future based on the past and present data, and is commonly used to analyse trends.
  • 9. Unsupervised learning There is no answer key or human operator to provide instruction. Instead, the machine determines the correlations and relationships by analysing available data.  Clustering involves grouping sets of similar data (based on defined criteria). It’s useful for segmenting data into several groups and performing analysis on each data set to find patterns.  Dimensionality reduction reduces the number of variables being considered to find the exact information required.
  • 10. Clusters of data, algorithm analyzes input data and identifies groups that share the same traits
  • 13. 50% - 80% of time spent in preparing data Tidy datasets are easy to manipulate, model and visualize And have a specific structure:  Each variable is a column  Each observation is a row  Each observation is a row  Each type of observational unit is a table Hadley Wickham
  • 14. Reinforcement learning  provided with a set of actions, parameters and end values. By defining the rules, the machine learning algorithm then tries to explore different options and possibilities, monitoring and evaluating each result to determine which one is optimal.  Considered an approach to AI so it is basically out of our scope  Some data is labeled but most of it is unlabeled and a mixture of supervised and unsupervised techniques can be used.  it can be expensive or time- consuming to label data as it may require experts. Whereas unlabeled data is cheap and easy to collect and store. Semi Supervised
  • 16. Algorithm Decision Factors Choosing the right machine learning algorithm depends on several factors:  data size  quality and diversity  What we want to derive from data  accuracy  training time  parameters  Learning type  Complexity  Result classification or regression  Some enable both  Basic vs Enhanced a combination of business need, specification, experimentation and time available.
  • 17. Algorithms We have over 50 algorithms and more Supervised only 28
  • 18. Naïve Bayes Classifier Algorithm (Supervised Learning )  The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as independent of any other value. It allows us to predict a class/category, based on a given set of features, using probability.  Simple and easy to understand  Fast up to 100x faster  Stable to data changes  Every feature has the same weight. Some of real world examples are :  To mark an email as spam, or not spam ?  Classify a news article about technology, politics, or sports ?  Check a piece of text expressing positive emotions, or negative emotions?  Also used for face recognition software
  • 19. How Naive Bayes algorithm works?  Let’s understand it using an example. In the next slide a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it.  Step 1: Convert the data set into a frequency table  Step 2: Create Likelihood table by finding the probabilities like  Overcast probability = 0.29 and  probability of playing is 0.64.
  • 21. Applications of Naive Bayes Algorithm  Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time.  Multi class Prediction: This algorithm is also well known for multi class prediction feature. It predicts the probability of multiple classes of target variable.  Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification .have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)  Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not
  • 22. Scala Code Example import org.apache.spark.ml.classification.NaiveBayes import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator // Load the data stored in LIBSVM format as a DataFrame. val data = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") // Split the data into training and test sets (30% held out for testing) val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3), seed = 1234L) // Train a NaiveBayes model. val model = new NaiveBayes() .fit(trainingData) // Select example rows to display. val predictions = model.transform(testData) predictions.show() // Select (prediction, true label) and compute test error val evaluator = new MulticlassClassificationEvaluator() .setLabelCol("label") .setPredictionCol("prediction") .setMetricName("accuracy") val accuracy = evaluator.evaluate(predictions) println(s"Test set accuracy = $accuracy")
  • 23. Support Vector Machine Algorithm (Supervised Learning )  SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs.  It essentially filters data into categories, which is achieved by providing a set of training examples, each set marked as belonging to one or the other of the two categories. The algorithm then works to build a model that assigns new values to one category or the other.  Also, it works by classifying the data into different classes by finding a line.  SVMs can not only make the reliable prediction but also can reduce redundant information.
  • 24. Applications of SVM in Real World  Face detection – SVMc classify parts of the image as a face and non-face and create a square boundary around the face.  Text and hypertext categorization – SVMs allow Text and hyper text categorization . They use training data to classify documents into different categories. It categorizes on the basis of the score generated and then compares with the threshold value.  Classification of images – Use of SVMs provides better search accuracy for image classification. It provides better accuracy in comparison to the traditional query based searching techniques.  Bioinformatics – It includes protein classification and cancer classification. We use SVM for identifying the classification of genes, patients on the basis of genes and other biological problems.  Handwriting recognition – We use SVMs to recognize hand written characters used widely.  Generalized predictive control(GPC)
  • 25. Kernel tricks are used to map a non-linearly separable functions into a higher dimension linearly separable function. Linear SVMKERNEL SVM
  • 26. Scala Code  import org.apache.spark.ml.classification.LinearSVC  // Load training data  val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")  val lsvc = new LinearSVC()  .setMaxIter(10)  .setRegParam(0.1)  // Fit the model  val lsvcModel = lsvc.fit(training)  // Print the coefficients and intercept for linear svc  println(s"Coefficients: ${lsvcModel.coefficients} Intercept: ${lsvcModel.intercept}")
  • 27. Linear Regression - Supervised Linear regression is the most basic type of regression.  Simple linear regression allows us to understand The relationships between two continuous variables.  If the dependent variable is not continuous but categorical, linear regression can be transformed to logistic regression It envolves finding the best fit line that passes through All the training data
  • 28. Regularized regression model  Technique Used to improve regression  Penalize complex models , we use it to keep models simple so that it does not over fit  Eliminates unimportant features  They force the model to make it simple  They force the coefficients to go to zero if they are not significant
  • 29. applications  Time series  Economics  Environmental science  Financial forecasting  Software cost prediction, effort prediction and software quality assurance.  Restructuring of the budget : Organization or Country  Predicting the crime rate of a states based on drug usage, number of gangs, human trafficking, and Killings.
  • 30. import org.apache.spark.ml.regression.LinearRegression // Load training data val training = spark.read.format("libsvm") .load("data/mllib/sample_linear_regression_data.txt") val lr = new LinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training) // Print the coefficients and intercept for linear regression println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") // Summarize the model over the training set and print out some metrics val trainingSummary = lrModel.summary println(s"numIterations: ${trainingSummary.totalIterations}") println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]") trainingSummary.residuals.show() println(s"RMSE: ${trainingSummary.rootMeanSquaredError}") println(s"r2: ${trainingSummary.r2}")
  • 31. Logistic Regression (Supervised learning – Classification)  Simple but performs well in many classifcations problems  Logistic regression focuses on estimating the probability of an event occurring based on the previous data provided.  It is used to cover a binary dependent variable, that is where only two values, 0 and 1, represent outcomes.  Confusing name meaning continuous values but the result is binary  Relations ships between features are weighted based On their impact on the result
  • 32. Applications of Logistic Regression  Image Segmentation and Categorization  Geographic Image Processing  Handwriting recognition  Healthcare  Depression Prediction  It is used in sentimental analysis like classifying good reviews from bad ones
  • 33. import org.apache.spark.ml.classification.LogisticRegression // Load training data val training = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val lr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training) // Print the coefficients and intercept for logistic regression println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") // We can also use the multinomial family for binary classification val mlr = new LogisticRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8) .setFamily("multinomial") val mlrModel = mlr.fit(training) // Print the coefficients and intercepts for logistic regression with multinomial family println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}") println(s"Multinomial intercepts: ${mlrModel.interceptVector}")
  • 34. Decision Trees (Supervised Learning – Classification/Regression)  A decision tree is a decision support tool that uses a tree- like graph or model of decisions and their possible consequences.  It is one way to display an algorithm that only contains conditional control statements.  It is a binary tree of nodes , every node contains a decision so It is nicely visualized  Applications: Its main purpose is classification Ps: in MILb it can be used as a regressor or a classifier.
  • 35. The only hyper parameter tuning that decision tree allow is the depth of the tree
  • 36. Random Forests (Supervised Learning – Classification/Regression) extremely powerful technique  an ensemble learning method, combining multiple algorithms to generate better results for classification, regression and other tasks.  Each individual classifier is weak, but when combined with others, can produce excellent results.  The algorithm starts with a ‘decision tree’ and an input is entered at the top. It then travels down the tree, with data being segmented into smaller and smaller sets, based on specific variables.
  • 38. How it works?  Random forests are built on the basics on decision trees  It trains many decision tress at the same time, every tree has its own parameters on a different sample of the same dataset then combine the output of all these decision trees  Randomly select “K” features from total “m” features where k << m  Among the “K” features, calculate the node “d” using the best split point  Split the node into daughter nodes using the best split  Repeat the a to c steps until “l” number of nodes has been reached  Build forest by repeating steps a to d for “n” number times to create “n” number of trees  Random forests performs well when individual trees that make up the ensemble are different as possible  They should have different trained data they should have different parameters
  • 39. Applications  Banking  Ecommerce  Medicine  Stockmarket
  • 40. K Means Clustering Algorithm (Unsupervised Learning - Clustering)  used to categorise unlabelled data  It works by finding groups within the data, with the number of groups represented  by the variable K.  It then works iteratively to assign each data point to one of K groups based on the features provided.
  • 41.  The results of the K-means clustering algorithm are:  The centroids of the K clusters,  which can be used to label new data  Labels for the training data (each data point is assigned to a single cluster)
  • 42. K-means Applications  Behavioural segmentation  Segment by purchase history  Segment by activities on application, website, or platform  Define personas based on interests  Create profiles based on activity monitoring  Inventory categorization:  Group inventory by sales activity  Group inventory by manufacturing metrics  Sorting sensor measurements:  Detect activity types in motion sensors  Group images  Separate audio  Identify groups in health monitoring  Detecting bots :  Separate valid activity groups from bots  Group valid activity to clean up outlier detection
  • 43. Scala code example import org.apache.spark.ml.clustering.KMeans import org.apache.spark.ml.evaluation.ClusteringEvaluator // Loads data. val dataset = spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt") // Trains a k-means model. val kmeans = new KMeans().setK(2).setSeed(1L) val model = kmeans.fit(dataset) // Make predictions val predictions = model.transform(dataset) // Evaluate clustering by computing Silhouette score val evaluator = new ClusteringEvaluator() val silhouette = evaluator.evaluate(predictions) println(s"Silhouette with squared euclidean distance = $silhouette") // Shows the result. println("Cluster Centers: ") model.clusterCenters.foreach(println)
  • 44. Collaborative Filtering - Alternative least squares  used for recommendation systems and personalized ranking  The DataFrame-based API for ALS currently only supports integers for user and item id  Not in fully functionality in ML  It has two main implementations :  Implicit feedback  Asking a user to rank a collection of items from favorite to least favorite or other things  Explicit feedback  Observing the items that a user views in an online store.  Analyzing item/user viewing times.[35]  Keeping a record of the items that a user purchases online.
  • 45. References  https://spark.apache.org/docs/latest/ml-guide.html // Ml guide - Df based Api  https://spark.apache.org/docs/latest/mllib-guide.html Mlib Guide - RDD based Api  https://archive.ics.uci.edu/ml/index.php //repository with more than 440 datasets  https://data-flair.training/blogs/machine-learning-algorithm/  http://www.cubicsol.com/machine-learning-algorithms/  https://en.wikepedia.org