SlideShare a Scribd company logo
Machine Learning
Lecture 2
Agenda
 Revision
 Data Preprocessing.
 Data cleaning
 Dealing with missing data.
 Dealing with categorical data.
 Feature scaling
 ▪ Rescaling
 ▪ Mean normalization
 ▪ Standardization
 ▪ Scaling to unit length
 Dimensionality reduction
 ▪ Feature selection
 ▪ Feature extraction
 Partitioning a dataset in training and testing sets
Revision
Learning
 In machine learning, pattern recognition and learning there is a
relationship/nature between the observations (input features) and
response (target), we need to understand this relationship .

❑ Learning
▪ The learning process tends to understand and predict new values
based on finding a mathematical model and a relationship between
the given observations ( i.e. inputs and output).

❑ Examples of learning problems:
❑ Predict a wind speed and a solar radiation.
❑ Detection of cancers; Breast, Leukemia and Spinal cancers.
❑ Estimate the amount of glucose in the blood.
Data Preprocessing
Introduction
 The quality of the data and the amount of useful information that it
contains are key factors that determine how well a machine learning
algorithm can learn.
 Therefore, it is absolutely critical that we make sure to examine and
preprocess a dataset before we feed it to a learning algorithm.
 we will discuss the essential data preprocessing techniques that will
help us to build good machine learning models.
 Data Preprocessing technique is used to manipulate and transform the raw
dataset into a clean and scaled dataset.
 In other words, whenever the dataset is gathered from different sources it is
collected in raw format which is not feasible for the analysis.
 Therefore, certain steps are executed to convert the dataset into a
small clean dataset.
 The data preprocessing technique is performed before the uses of the
collected dataset.
Data Preprocessing
Introduction
 Data Cleaning
 Feature Scaling
 Dimensionality Reduction
Data Preprocessing
Introduction
Data Preprocessing
Why we need to the data preprocessing
process?
 The main objectives of data preprocessing are to manipulate and
transform raw data into cleaned and scaled format.
 In addition it is important to compress the data onto a smaller
dimensional subspace while retaining most of the relevant
information.
Data Preprocessing
Types of data preprocessing
Data Preprocessing
Data cleaning: missing data
 In real-world applications are familiar that the collected data samples contain one
or more missing values for various reasons.
 These reasons include:
 There could have been an error in the data collection process
 certain measurements are not applicable and
 particular fields could have been simply left blank in a survey, for
instance.
 We typically see missing values as the blank spaces in our data
table or as placeholder strings such as NaN (Not A Number).
Data Preprocessing
Data cleaning: missing data
Data Preprocessing
Data cleaning: missing data
Data Preprocessing
Data cleaning: missing data
Data Preprocessing
Data cleaning: missing data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Data cleaning: categorical data
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling
Data Preprocessing
Feature scaling: rescaling
Data Preprocessing
Feature scaling: rescaling
Data Preprocessing
Feature scaling: mean normalization
Data Preprocessing
Feature scaling: standardization
Data Preprocessing
Feature scaling: scaling to unit length
Data Preprocessing
Dimensionality Reduction
Data Preprocessing
Partitioning a dataset in training and testing
sets
Data Preprocessing
Partitioning a dataset in training and testing
sets
Data Preprocessing
Partitioning a dataset in training and testing
sets
Data Preprocessing
Partitioning a dataset in training and testing
sets
Data Preprocessing
Partitioning a dataset in training and testing
sets
ML_Lec2 introduction to data processing.pdf
ML_Lec2 introduction to data processing.pdf
ML_Lec2 introduction to data processing.pdf
ML_Lec2 introduction to data processing.pdf

More Related Content

Similar to ML_Lec2 introduction to data processing.pdf (20)

PPTX
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
 
PDF
Data Preprocessing -Data Quality Noisy Data
ShivarkarSandip
 
PPTX
Data preprocessing in Machine learning
pyingkodi maran
 
PPTX
Data Preprocessing techniques in Data Science
Muazzam25
 
PDF
Data preprocessing.pdf
yvarsha14411
 
PDF
Data Cleaning and Preprocessing: Ensuring Data Quality
priyanka rajput
 
PPTX
Data science engineering Preprocessing.pptx
khanjunaid786943
 
PDF
Data Preparation and Preprocessing , Data Cleaning
ShivarkarSandip
 
PPTX
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
AadityaRathi4
 
PDF
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
 
PDF
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET Journal
 
PDF
overview of_data_processing
FEG
 
PDF
4 preprocess
anita desiani
 
PPT
preproccessing level 3 for students.ppt
AhmedAlrashdy
 
PPTX
UNIT-1 Data pre-processing-Data cleaning, Transformation, Reduction, Integrat...
subhashchandra197
 
PPTX
Advance Data_Preprocessing_and_Wrangling
Bhushan134837
 
PPTX
DATA preprocessing.pptx
Chandra Meena
 
PPT
ML-ChapterTwo-Data Preprocessing.ppt
belay41
 
PPTX
Cloud Computing about Data Processing.pptx
AnsarHasas1
 
PDF
Introduction to Artificial Intelligence_ Lec 5
Dalal2Ali
 
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
 
Data Preprocessing -Data Quality Noisy Data
ShivarkarSandip
 
Data preprocessing in Machine learning
pyingkodi maran
 
Data Preprocessing techniques in Data Science
Muazzam25
 
Data preprocessing.pdf
yvarsha14411
 
Data Cleaning and Preprocessing: Ensuring Data Quality
priyanka rajput
 
Data science engineering Preprocessing.pptx
khanjunaid786943
 
Data Preparation and Preprocessing , Data Cleaning
ShivarkarSandip
 
Unit _2 Data Processing.pptx FOR THE DATA SCIENCE STUDENTSHE
AadityaRathi4
 
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET Journal
 
overview of_data_processing
FEG
 
4 preprocess
anita desiani
 
preproccessing level 3 for students.ppt
AhmedAlrashdy
 
UNIT-1 Data pre-processing-Data cleaning, Transformation, Reduction, Integrat...
subhashchandra197
 
Advance Data_Preprocessing_and_Wrangling
Bhushan134837
 
DATA preprocessing.pptx
Chandra Meena
 
ML-ChapterTwo-Data Preprocessing.ppt
belay41
 
Cloud Computing about Data Processing.pptx
AnsarHasas1
 
Introduction to Artificial Intelligence_ Lec 5
Dalal2Ali
 

More from BeshoyArnest (8)

PDF
ML_Lec1 introduction to machine learning.pdf
BeshoyArnest
 
PDF
ML_Lec4 introduction to linear regression.pdf
BeshoyArnest
 
PDF
Lec6,7,8 K-means, Niavebase, KNearstN.pdf
BeshoyArnest
 
PDF
ML_Lec3 introduction to regression problems.pdf
BeshoyArnest
 
PDF
Machine Learning-Lec7 Bayesian calssification.pdf
BeshoyArnest
 
PDF
Machine Learning-Lec6 expalin the decision .pdf
BeshoyArnest
 
PDF
Machine Learning-Lec8 support vector machine.pdf
BeshoyArnest
 
PDF
Machine Learning-Lec5.pdf_explain of logistic regression
BeshoyArnest
 
ML_Lec1 introduction to machine learning.pdf
BeshoyArnest
 
ML_Lec4 introduction to linear regression.pdf
BeshoyArnest
 
Lec6,7,8 K-means, Niavebase, KNearstN.pdf
BeshoyArnest
 
ML_Lec3 introduction to regression problems.pdf
BeshoyArnest
 
Machine Learning-Lec7 Bayesian calssification.pdf
BeshoyArnest
 
Machine Learning-Lec6 expalin the decision .pdf
BeshoyArnest
 
Machine Learning-Lec8 support vector machine.pdf
BeshoyArnest
 
Machine Learning-Lec5.pdf_explain of logistic regression
BeshoyArnest
 
Ad

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Ad

ML_Lec2 introduction to data processing.pdf