SlideShare a Scribd company logo
Machine Learning for Language
Technology
Lecture 2: Basic Concepts
Marina Santini
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Autumn 2014
Acknowledgement: Thanks to Prof. Joakim Nivre for course design and material
Outline
• Definition of Machine Learning
• Type of Machine Learning:
– Classification
– Regression
– Supervised Learning
– Unsupervised Learning
– Reinforcement Learning
• Supervised Learning:
– Supervised Classification
– Training set
– Hypothesis class
– Empirical error
– Margin
– Noise
– Inductive bias
– Generalization
– Model assessment
– Cross-Validation
– Classification in NLP
– Types of Classification
Lecture 2: Basic Concepts 2
What is Machine Learning
• Machine learning is programming computers to
optimize a performance criterion for some task
using example data or past experience
• Why learning?
– No known exact method – vision, speech recognition,
robotics, spam filters, etc.
– Exact method too expensive – statistical physics
– Task evolves over time – network routing
• Compare:
– No need to use machine learning for computing
payroll… we just need an algorithm
Lecture 2: Basic Concepts 3
Machine Learning – Data Mining –
Artificial Intelligence – Statistics
• Machine Learning: creation of a model that uses training data or
past experience
• Data Mining: application of learning methods to large datasets (ex.
physics, astronomy, biology, etc.)
– Text mining = machine learning applied to unstructured textual data
(ex. sentiment analyisis, social media monitoring, etc. Text Mining,
Wikipedia)
• Artificial intelligence: a model that can adapt to a changing
environment.
• Statistics: Machine learning uses the theory of statistics in building
mathematical models, because the core task is making inference from a
sample.
Lecture 2: Basic Concepts 4
The bio-cognitive analogy
• Imagine that a learning algorithm as a single neuron.
• This neuron receives input from other neurons, one
for each input feature.
• The strength of these inputs are the feature values.
• Each input has a weight and the neuron simply sums
up all the weighted inputs.
• Based on this sum, the neuron decides whether to
“fire” or not. Firing is interpreted as being a positive
example and not firing is interpreted as being a
negative example.
Lecture 2: Basic Concepts 5
Elements of Machine Learning
1. Generalization:
– Generalize from specific examples
– Based on statistical inference
2. Data:
– Training data: specific examples to learn from
– Test data: (new) specific examples to assess performance
3. Models:
– Theoretical assumptions about the task/domain
– Parameters that can be inferred from data
4. Algorithms:
– Learning algorithm: infer model (parameters) from data
– Inference algorithm: infer predictions from model
Lecture 2: Basic Concepts 6
Types of Machine Learning
• Association
• Supervised Learning
– Classification
– Regression
• Unsupervised Learning
• Reinforcement Learning
Lecture 2: Basic Concepts 7
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys
X also buys Y where X and Y are
products/services
Example: P ( chips | beer ) = 0.7
Lecture 2: Basic Concepts 8
Classification
Lecture 2: Basic Concepts 9
• Example: Credit
scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Classification in NLP
• Binary classification:
– Spam filtering (spam vs. non-spam)
– Spelling error detection (error vs. non error)
• Multiclass classification:
– Text categorization (news, economy, culture, sport, ...)
– Named entity classification (person, location,
organization, ...)
• Structured prediction:
– Part-of-speech tagging (classes = tag sequences)
– Syntactic parsing (classes = parse trees)
Lecture 2: Basic Concepts 10
Regression
• Example:
Price of used car
• x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
Lecture 2: Basic Concepts
11
y = wx+w0
Uses of Supervised Learning
• Prediction of future cases:
– Use the rule to predict the output for future inputs
• Knowledge extraction:
– The rule is easy to understand
• Compression:
– The rule is simpler than the data it explains
• Outlier detection:
– Exceptions that are not covered by the rule, e.g., fraud
Lecture 2: Basic Concepts 12
Unsupervised Learning
• Finding regularities in data
• No mapping to outputs
• Clustering:
– Grouping similar instances
• Example applications:
– Customer segmentation in CRM
– Image compression: Color quantization
– NLP: Unsupervised text categorization
Lecture 2: Basic Concepts 13
Reinforcement Learning
• Learning a policy = sequence of
outputs/actions
• No supervised output but delayed reward
• Example applications:
– Game playing
– Robot in a maze
– NLP: Dialogue systems
Lecture 2: Basic Concepts 14
Supervised Classification
• Learning the class C of a “family car” from
examples
– Prediction: Is car x a family car?
– Knowledge extraction: What do people expect from a
family car?
• Output (labels):
Positive (+) and negative (–) examples
• Input representation (features):
x1: price, x2 : engine power
Lecture 2: Basic Concepts 15
Training set X

X  {xt
,rt
}t1
N

r 
1 if x is positive
0 if x is negative



Lecture 2: Basic Concepts
16

x 
x1
x2






Hypothesis class H

p1  price  p2 AND e1  engine power  e2 
Lecture 2: Basic Concepts 17
Empirical (training) error

h(x) 
1 if h says x is positive
0 if h says x is negative




E(h | X)  1 h xt
  rt
 t1
N

Lecture 2: Basic Concepts 18
Empirical error of h on X:
S, G, and the Version Space
Lecture 2: Basic Concepts 19
most specific hypothesis, S
most general hypothesis, G
h  H, between S and G is
consistent [E( h | X) = 0] and
make up the version space
Margin
• Choose h with largest margin
Lecture 2: Basic Concepts 20
Noise
Unwanted anomaly in data
• Imprecision in input attributes
• Errors in labeling data points
• Hidden attributes (relative to H)
Consequence:
• No h in H may be consistent!
Lecture 2: Basic Concepts 21
Noise and Model Complexity
Arguments for simpler model (Occam’s razor principle):
1. Easier to make predictions
2. Easier to train (fewer parameters)
3. Easier to understand
4. Generalizes better (if data is noisy)
Lecture 2: Basic Concepts 22
Inductive Bias
• Learning is an ill-posed problem
– Training data is never sufficient to find a unique
solution
– There are always infinitely many consistent
hypotheses
• We need an inductive bias:
– Assumptions that entail a unique h for a training set X
1. Hypothesis class H – axis-aligned rectangles
2. Learning algorithm – find consistent hypothesis with max-
margin
3. Hyperparameters – trade-off between training error and
margin
Lecture 2: Basic Concepts 23
Model Selection and Generalization
• Generalization – how well a model performs
on new data
– Overfitting: H more complex than C
– Underfitting: H less complex than C
Lecture 2: Basic Concepts 24
Triple Trade-Off
• Trade-off between three factors:
1. Complexity of H, c(H)
2. Training set size N
3. Generalization error E on new data
• Dependencies:
– As N, E
– As c(H), first E and then E
Lecture 2: Basic Concepts 25
Model Selection  Generalization Error
• To estimate generalization error, we need data unseen
during training:
• Given models (hypotheses) h1, ..., hk induced from the
training set X, we can use E(hi | V ) to select the
model hi with the smallest generalization error
Lecture 2: Basic Concepts 26

ˆE  E(h | V)  1 h xt
  rt
 t1
M


V  {xt
,rt
}t1
M
 X
Model Assessment
• To estimate the generalization error of the best
model hi, we need data unseen during training
and model selection
• Standard setup:
1. Training set X (50–80%)
2. Validation (development) set V (10–25%)
3. Test (publication) set T (10–25%)
• Note:
– Validation data can be added to training set before
testing
– Resampling methods can be used if data is limited
Lecture 2: Basic Concepts 27
Cross-Validation
121
31
2
2
2
32
1
1
1



K
K
K
K
K
K
XXXTXV
XXXTXV
XXXTXV




Lecture 2: Basic Concepts 28
• K-fold cross-validation: Divide X into X1, ..., XK
• Note:
– Generalization error estimated by means across K folds
– Training sets for different folds share K–2 parts
– Separate test set must be maintained for model
assessment
Bootstrapping
3680
1
1 1
.





 
e
N
N
Lecture 2: Basic Concepts 29
• Generate new training sets of size N from X by random
sampling with replacement
• Use original training set as validation set (V = X )
• Probability that we do not pick an instance after N
draws
that is, only 36.8% of instances are new!
Measuring Error
• Error rate = # of errors / # of instances = (FP+FN) / N
• Accuracy = # of correct / # of instances = (TP+TN) / N
• Recall = # of found positives / # of positives = TP / (TP+FN)
• Precision = # of found positives / # of found = TP / (TP+FP)
Lecture 2: Basic Concepts 30
Statistical Inference
• Interval estimation to quantify the precision of
our measurements
• Hypothesis testing to assess whether
differences between models are statistically
significant
Lecture 2: Basic Concepts 31

m 1.96

N
e01  e10 1 
2
e01  e10
~ X1
2
Supervised Learning – Summary
• Training data + learner  hypothesis
– Learner incorporates inductive bias
• Test data + hypothesis  estimated generalization
– Test data must be unseen
Lecture 2: Basic Concepts 32
Anatomy of a Supervised Learner
(Dimensions of a supervised machine learning algorithm)
• Model:
• Loss function:
• Optimization procedure:

g x |q 

E q | X  L rt
,g xt
|q  t

Lecture 2: Basic Concepts
33

q*  arg min
q
E q | X 
Supervised Classification: Extension
34
• Divide instances into (two or more) classes
– Instance (feature vector):
• Features may be categorical or numerical
– Class (label):
– Training data:
• Classification in Language Technology
– Spam filtering (spam vs. non-spam)
– Spelling error detection (error vs. no error)
– Text categorization (news, economy, culture, sport, ...)
– Named entity classification (person, location, organization, ...)

X  {xt
,yt
}t1
N

x  x1, , xm

y
Lec 2: Decision Trees - Nearest Neighbors
NLP: Classification (i)
NLP: Classification (ii)
NLP: Classification (iii)
Types of Classification (i)
Types of Classification (ii)
Reading
• Alpaydin (2010): chs 1-2; 19
• Daume’ III (2012): ch 4: only 4.5-4.6
Lecture 2: Basic Concepts 40
End of Lecture 2
Lecture 2: Basic Concepts 41

More Related Content

What's hot (20)

PPTX
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
PPTX
Deep Neural Networks (DNN)
Sir Syed University of Engineering & Technology
 
PPT
Pattern Recognition
Talal Alsubaie
 
PPTX
Data preprocessing
Gajanand Sharma
 
PPTX
Data preprocessing in Machine learning
pyingkodi maran
 
PPTX
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PDF
Classification Techniques
Kiran Bhowmick
 
PPTX
Semi-Supervised Learning
Lukas Tencer
 
PDF
Machine learning vs deep learning
USM Systems
 
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
PDF
Bayes Belief Networks
Sai Kumar Kodam
 
PPTX
Face recognization using artificial nerual network
Dharmesh Tank
 
PDF
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
PDF
Machine learning in image processing
Data Science Thailand
 
PDF
Naive Bayes
CloudxLab
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
PPTX
Unsupervised learning clustering
Arshad Farhad
 
PPT
similarity measure
ZHAO Sam
 
PDF
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
PPTX
Introduction to-machine-learning
Babu Priyavrat
 
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
 
Pattern Recognition
Talal Alsubaie
 
Data preprocessing
Gajanand Sharma
 
Data preprocessing in Machine learning
pyingkodi maran
 
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
Classification Techniques
Kiran Bhowmick
 
Semi-Supervised Learning
Lukas Tencer
 
Machine learning vs deep learning
USM Systems
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 
Bayes Belief Networks
Sai Kumar Kodam
 
Face recognization using artificial nerual network
Dharmesh Tank
 
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Machine learning in image processing
Data Science Thailand
 
Naive Bayes
CloudxLab
 
Introduction to Machine Learning Classifiers
Functional Imperative
 
Unsupervised learning clustering
Arshad Farhad
 
similarity measure
ZHAO Sam
 
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Introduction to-machine-learning
Babu Priyavrat
 

Viewers also liked (20)

PPTX
EASA Part 66 Module 5.10 : Fibre Optic
soulstalker
 
PPT
ruby laser
Jeevan M C
 
PPTX
Solid state detector mamita
Mamita Sakhakarmi
 
PPT
DETERMINATION OF THERMAL CONDUCTIVITY OF OIL
Samuel Alexander
 
PPTX
Review journal Acoustic –essential requirement for public building”
Sayed Umam
 
PPTX
Ruby & Nd YAG laser By Sukdeep Singh
Sukhdeep Bisht
 
PPT
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
Abhi Hirpara
 
PPTX
determination of thermal conductivity
Anjali Sudhakar
 
PPTX
Geiger muller counter
Avi Dhawal
 
PPT
Acoustic Problems in ADM Building
yanying
 
PPTX
Laser physics
AakashLimbad
 
PPTX
Mechanical Design Presentation
raytec
 
PPT
Transmission Of Heat
scotfuture
 
PPTX
Transfer of heat
Anuradha Sajwan
 
PPTX
Basic Engineering Design (Part 2): Researching the Need
Denise Wilson
 
PPTX
Thermal Conductivity
talatameen42
 
PPTX
Concepts in engineering design
MITS Gwalior
 
PPT
physics b.tech. 1st sem fibre optics,u 4
Kumar
 
PPTX
GEIGER MULLER COUNTER
kizhakkethazhekuni
 
KEY
The Design Cycle @DISK Introduction
Sean Thompson 
 
EASA Part 66 Module 5.10 : Fibre Optic
soulstalker
 
ruby laser
Jeevan M C
 
Solid state detector mamita
Mamita Sakhakarmi
 
DETERMINATION OF THERMAL CONDUCTIVITY OF OIL
Samuel Alexander
 
Review journal Acoustic –essential requirement for public building”
Sayed Umam
 
Ruby & Nd YAG laser By Sukdeep Singh
Sukhdeep Bisht
 
B.Tech sem I Engineering Physics U-I Chapter 1-Optical fiber
Abhi Hirpara
 
determination of thermal conductivity
Anjali Sudhakar
 
Geiger muller counter
Avi Dhawal
 
Acoustic Problems in ADM Building
yanying
 
Laser physics
AakashLimbad
 
Mechanical Design Presentation
raytec
 
Transmission Of Heat
scotfuture
 
Transfer of heat
Anuradha Sajwan
 
Basic Engineering Design (Part 2): Researching the Need
Denise Wilson
 
Thermal Conductivity
talatameen42
 
Concepts in engineering design
MITS Gwalior
 
physics b.tech. 1st sem fibre optics,u 4
Kumar
 
GEIGER MULLER COUNTER
kizhakkethazhekuni
 
The Design Cycle @DISK Introduction
Sean Thompson 
 
Ad

Similar to Lecture 2 Basic Concepts in Machine Learning for Language Technology (20)

PDF
Fundementals of Machine Learning and Deep Learning
ParrotAI
 
PDF
Week 1.pdf
AnjaliJain608033
 
PPT
Machine Learning: Foundations Course Number 0368403401
butest
 
PPT
Lecture: introduction to Machine Learning.ppt
NiteshJha97
 
PPT
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
PPT
Machine Learning: Foundations Course Number 0368403401
butest
 
PPT
Machine Learning: Foundations Course Number 0368403401
butest
 
PPTX
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
PDF
Machine Learning part 2 - Introduction to Data Science
Frank Kienle
 
PDF
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
PDF
MLE.pdf
appalondhe2
 
PDF
01_DecisionTreesAndOverfitting-1-12-2015.pdf
samy619743
 
PDF
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
PPTX
Launching into machine learning
Dr.R. Gunavathi Ramasamy
 
PPTX
06-01 Machine Learning and Linear Regression.pptx
SaharA84
 
PDF
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
PPTX
Machine learning
Sukhwinder Singh
 
PPT
Introduction to Machine Learning Aristotelis Tsirigos
butest
 
PPTX
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
PPT
Machine learning and deep learning algorithms
KannanA29
 
Fundementals of Machine Learning and Deep Learning
ParrotAI
 
Week 1.pdf
AnjaliJain608033
 
Machine Learning: Foundations Course Number 0368403401
butest
 
Lecture: introduction to Machine Learning.ppt
NiteshJha97
 
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
Machine Learning: Foundations Course Number 0368403401
butest
 
Machine Learning: Foundations Course Number 0368403401
butest
 
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
Machine Learning part 2 - Introduction to Data Science
Frank Kienle
 
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
MLE.pdf
appalondhe2
 
01_DecisionTreesAndOverfitting-1-12-2015.pdf
samy619743
 
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Launching into machine learning
Dr.R. Gunavathi Ramasamy
 
06-01 Machine Learning and Linear Regression.pptx
SaharA84
 
林守德/Practical Issues in Machine Learning
台灣資料科學年會
 
Machine learning
Sukhwinder Singh
 
Introduction to Machine Learning Aristotelis Tsirigos
butest
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Machine learning and deep learning algorithms
KannanA29
 
Ad

More from Marina Santini (20)

PDF
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Marina Santini
 
PDF
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini
 
PDF
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
Marina Santini
 
PDF
An Exploratory Study on Genre Classification using Readability Features
Marina Santini
 
PDF
Lecture: Semantic Word Clouds
Marina Santini
 
PDF
Lecture: Ontologies and the Semantic Web
Marina Santini
 
PDF
Lecture: Summarization
Marina Santini
 
PDF
Relation Extraction
Marina Santini
 
PDF
Lecture: Question Answering
Marina Santini
 
PDF
IE: Named Entity Recognition (NER)
Marina Santini
 
PDF
Lecture: Vector Semantics (aka Distributional Semantics)
Marina Santini
 
PDF
Lecture: Word Sense Disambiguation
Marina Santini
 
PDF
Lecture: Word Senses
Marina Santini
 
PDF
Sentiment Analysis
Marina Santini
 
PDF
Semantic Role Labeling
Marina Santini
 
PDF
Semantics and Computational Semantics
Marina Santini
 
PDF
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
PDF
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
PDF
Lecture 5: Interval Estimation
Marina Santini
 
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
Marina Santini
 
Lecture: Semantic Word Clouds
Marina Santini
 
Lecture: Ontologies and the Semantic Web
Marina Santini
 
Lecture: Summarization
Marina Santini
 
Relation Extraction
Marina Santini
 
Lecture: Question Answering
Marina Santini
 
IE: Named Entity Recognition (NER)
Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Marina Santini
 
Lecture: Word Sense Disambiguation
Marina Santini
 
Lecture: Word Senses
Marina Santini
 
Sentiment Analysis
Marina Santini
 
Semantic Role Labeling
Marina Santini
 
Semantics and Computational Semantics
Marina Santini
 
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Lecture 5: Interval Estimation
Marina Santini
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 

Recently uploaded (20)

PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
community health nursing question paper 2.pdf
Prince kumar
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 

Lecture 2 Basic Concepts in Machine Learning for Language Technology

  • 1. Machine Learning for Language Technology Lecture 2: Basic Concepts Marina Santini Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and material
  • 2. Outline • Definition of Machine Learning • Type of Machine Learning: – Classification – Regression – Supervised Learning – Unsupervised Learning – Reinforcement Learning • Supervised Learning: – Supervised Classification – Training set – Hypothesis class – Empirical error – Margin – Noise – Inductive bias – Generalization – Model assessment – Cross-Validation – Classification in NLP – Types of Classification Lecture 2: Basic Concepts 2
  • 3. What is Machine Learning • Machine learning is programming computers to optimize a performance criterion for some task using example data or past experience • Why learning? – No known exact method – vision, speech recognition, robotics, spam filters, etc. – Exact method too expensive – statistical physics – Task evolves over time – network routing • Compare: – No need to use machine learning for computing payroll… we just need an algorithm Lecture 2: Basic Concepts 3
  • 4. Machine Learning – Data Mining – Artificial Intelligence – Statistics • Machine Learning: creation of a model that uses training data or past experience • Data Mining: application of learning methods to large datasets (ex. physics, astronomy, biology, etc.) – Text mining = machine learning applied to unstructured textual data (ex. sentiment analyisis, social media monitoring, etc. Text Mining, Wikipedia) • Artificial intelligence: a model that can adapt to a changing environment. • Statistics: Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. Lecture 2: Basic Concepts 4
  • 5. The bio-cognitive analogy • Imagine that a learning algorithm as a single neuron. • This neuron receives input from other neurons, one for each input feature. • The strength of these inputs are the feature values. • Each input has a weight and the neuron simply sums up all the weighted inputs. • Based on this sum, the neuron decides whether to “fire” or not. Firing is interpreted as being a positive example and not firing is interpreted as being a negative example. Lecture 2: Basic Concepts 5
  • 6. Elements of Machine Learning 1. Generalization: – Generalize from specific examples – Based on statistical inference 2. Data: – Training data: specific examples to learn from – Test data: (new) specific examples to assess performance 3. Models: – Theoretical assumptions about the task/domain – Parameters that can be inferred from data 4. Algorithms: – Learning algorithm: infer model (parameters) from data – Inference algorithm: infer predictions from model Lecture 2: Basic Concepts 6
  • 7. Types of Machine Learning • Association • Supervised Learning – Classification – Regression • Unsupervised Learning • Reinforcement Learning Lecture 2: Basic Concepts 7
  • 8. Learning Associations • Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services Example: P ( chips | beer ) = 0.7 Lecture 2: Basic Concepts 8
  • 9. Classification Lecture 2: Basic Concepts 9 • Example: Credit scoring • Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
  • 10. Classification in NLP • Binary classification: – Spam filtering (spam vs. non-spam) – Spelling error detection (error vs. non error) • Multiclass classification: – Text categorization (news, economy, culture, sport, ...) – Named entity classification (person, location, organization, ...) • Structured prediction: – Part-of-speech tagging (classes = tag sequences) – Syntactic parsing (classes = parse trees) Lecture 2: Basic Concepts 10
  • 11. Regression • Example: Price of used car • x : car attributes y : price y = g (x | q ) g ( ) model, q parameters Lecture 2: Basic Concepts 11 y = wx+w0
  • 12. Uses of Supervised Learning • Prediction of future cases: – Use the rule to predict the output for future inputs • Knowledge extraction: – The rule is easy to understand • Compression: – The rule is simpler than the data it explains • Outlier detection: – Exceptions that are not covered by the rule, e.g., fraud Lecture 2: Basic Concepts 12
  • 13. Unsupervised Learning • Finding regularities in data • No mapping to outputs • Clustering: – Grouping similar instances • Example applications: – Customer segmentation in CRM – Image compression: Color quantization – NLP: Unsupervised text categorization Lecture 2: Basic Concepts 13
  • 14. Reinforcement Learning • Learning a policy = sequence of outputs/actions • No supervised output but delayed reward • Example applications: – Game playing – Robot in a maze – NLP: Dialogue systems Lecture 2: Basic Concepts 14
  • 15. Supervised Classification • Learning the class C of a “family car” from examples – Prediction: Is car x a family car? – Knowledge extraction: What do people expect from a family car? • Output (labels): Positive (+) and negative (–) examples • Input representation (features): x1: price, x2 : engine power Lecture 2: Basic Concepts 15
  • 16. Training set X  X  {xt ,rt }t1 N  r  1 if x is positive 0 if x is negative    Lecture 2: Basic Concepts 16  x  x1 x2      
  • 17. Hypothesis class H  p1  price  p2 AND e1  engine power  e2  Lecture 2: Basic Concepts 17
  • 18. Empirical (training) error  h(x)  1 if h says x is positive 0 if h says x is negative     E(h | X)  1 h xt   rt  t1 N  Lecture 2: Basic Concepts 18 Empirical error of h on X:
  • 19. S, G, and the Version Space Lecture 2: Basic Concepts 19 most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent [E( h | X) = 0] and make up the version space
  • 20. Margin • Choose h with largest margin Lecture 2: Basic Concepts 20
  • 21. Noise Unwanted anomaly in data • Imprecision in input attributes • Errors in labeling data points • Hidden attributes (relative to H) Consequence: • No h in H may be consistent! Lecture 2: Basic Concepts 21
  • 22. Noise and Model Complexity Arguments for simpler model (Occam’s razor principle): 1. Easier to make predictions 2. Easier to train (fewer parameters) 3. Easier to understand 4. Generalizes better (if data is noisy) Lecture 2: Basic Concepts 22
  • 23. Inductive Bias • Learning is an ill-posed problem – Training data is never sufficient to find a unique solution – There are always infinitely many consistent hypotheses • We need an inductive bias: – Assumptions that entail a unique h for a training set X 1. Hypothesis class H – axis-aligned rectangles 2. Learning algorithm – find consistent hypothesis with max- margin 3. Hyperparameters – trade-off between training error and margin Lecture 2: Basic Concepts 23
  • 24. Model Selection and Generalization • Generalization – how well a model performs on new data – Overfitting: H more complex than C – Underfitting: H less complex than C Lecture 2: Basic Concepts 24
  • 25. Triple Trade-Off • Trade-off between three factors: 1. Complexity of H, c(H) 2. Training set size N 3. Generalization error E on new data • Dependencies: – As N, E – As c(H), first E and then E Lecture 2: Basic Concepts 25
  • 26. Model Selection  Generalization Error • To estimate generalization error, we need data unseen during training: • Given models (hypotheses) h1, ..., hk induced from the training set X, we can use E(hi | V ) to select the model hi with the smallest generalization error Lecture 2: Basic Concepts 26  ˆE  E(h | V)  1 h xt   rt  t1 M   V  {xt ,rt }t1 M  X
  • 27. Model Assessment • To estimate the generalization error of the best model hi, we need data unseen during training and model selection • Standard setup: 1. Training set X (50–80%) 2. Validation (development) set V (10–25%) 3. Test (publication) set T (10–25%) • Note: – Validation data can be added to training set before testing – Resampling methods can be used if data is limited Lecture 2: Basic Concepts 27
  • 28. Cross-Validation 121 31 2 2 2 32 1 1 1    K K K K K K XXXTXV XXXTXV XXXTXV     Lecture 2: Basic Concepts 28 • K-fold cross-validation: Divide X into X1, ..., XK • Note: – Generalization error estimated by means across K folds – Training sets for different folds share K–2 parts – Separate test set must be maintained for model assessment
  • 29. Bootstrapping 3680 1 1 1 .        e N N Lecture 2: Basic Concepts 29 • Generate new training sets of size N from X by random sampling with replacement • Use original training set as validation set (V = X ) • Probability that we do not pick an instance after N draws that is, only 36.8% of instances are new!
  • 30. Measuring Error • Error rate = # of errors / # of instances = (FP+FN) / N • Accuracy = # of correct / # of instances = (TP+TN) / N • Recall = # of found positives / # of positives = TP / (TP+FN) • Precision = # of found positives / # of found = TP / (TP+FP) Lecture 2: Basic Concepts 30
  • 31. Statistical Inference • Interval estimation to quantify the precision of our measurements • Hypothesis testing to assess whether differences between models are statistically significant Lecture 2: Basic Concepts 31  m 1.96  N e01  e10 1  2 e01  e10 ~ X1 2
  • 32. Supervised Learning – Summary • Training data + learner  hypothesis – Learner incorporates inductive bias • Test data + hypothesis  estimated generalization – Test data must be unseen Lecture 2: Basic Concepts 32
  • 33. Anatomy of a Supervised Learner (Dimensions of a supervised machine learning algorithm) • Model: • Loss function: • Optimization procedure:  g x |q   E q | X  L rt ,g xt |q  t  Lecture 2: Basic Concepts 33  q*  arg min q E q | X 
  • 34. Supervised Classification: Extension 34 • Divide instances into (two or more) classes – Instance (feature vector): • Features may be categorical or numerical – Class (label): – Training data: • Classification in Language Technology – Spam filtering (spam vs. non-spam) – Spelling error detection (error vs. no error) – Text categorization (news, economy, culture, sport, ...) – Named entity classification (person, location, organization, ...)  X  {xt ,yt }t1 N  x  x1, , xm  y Lec 2: Decision Trees - Nearest Neighbors
  • 40. Reading • Alpaydin (2010): chs 1-2; 19 • Daume’ III (2012): ch 4: only 4.5-4.6 Lecture 2: Basic Concepts 40
  • 41. End of Lecture 2 Lecture 2: Basic Concepts 41