Nearest Neighbour
Nearest Neighbour Rule
Non-parametric pattern
classification.
Consider a two class problem
where each sample consists of
two measurements (x,y).
k = 1
k = 3
For a given query point q,
assign the class of the
nearest neighbour.
Compute the k nearest
neighbours and assign the
class by majority vote.
K-Nearest Neighbor
• When we interpret records as points in a data
space, we can define the concepts of
neighborhood.
– “Records that are close to each other live in each
others neighborhood.”
– “Records of the same type will be close to each other
in the data space.”
– i.e. customers of same type will show same behavior.
K-Nearest Neighbor
• Basic principle of K-nearest neighbor is
– “Do as your neighbor Do”
• If we want to predict the behavior of certain individual,
we first to look at the behavior of, for example: ten
individuals that are close to him/her in the data space.
• We calculate the average of 10 neighbors and this
average value will be the prediction for our individual.
• K=> No. of Neighbors. 5-Nearest neighbor => 5
neighbors to be taken for calculation
Nearest Neighbour Issues
• Expensive
– To determine the nearest neighbour of a query point q, must compute
the distance to all N training examples
+ Pre-sort training examples into fast data structures (kd-trees)
+ Compute only an approximate distance (LSH)
+ Remove redundant data (condensing)
• Storage Requirements
– Must store all training data P
+ Remove redundant data (condensing)
- Pre-sorting often increases the storage requirements
• High Dimensional Data
– “Curse of Dimensionality”
• Required amount of training data increases exponentially with dimension
• Computational cost also increases dramatically
• Partitioning techniques degrade to linear search in high dimension
Questions
• What distance measure to use?
– Often Euclidean distance is used
– Locally adaptive metrics
– More complicated with non-numeric data, or when different dimensions
have different scales
• Choice of k?
– Cross-validation
– 1-NN often performs well in practice
– k-NN needed for overlapping classes
– Re-label all data according to k-NN, then classify with 1-NN
– Reduce k-NN problem to 1-NN through dataset editing
Exact Nearest Neighbour
• Asymptotic error (infinite sample size) is less than twice the Bayes
classification error
– Requires a lot of training data
• Expensive for high dimensional data (d>20?)
• O(Nd) complexity for both storage and query time
– N is the number of training examples, d is the dimension of each sample
– This can be reduced through dataset editing/condensing
Decision Regions
Each cell contains one
sample, and every
location within the cell is
closer to that sample than
to any other sample.
A Voronoi diagram divides
the space into such cells.
Every query point will be assigned the classification of the sample within that
cell. The decision boundary separates the class regions based on the 1-NN
decision rule.
Knowledge of this boundary is sufficient to classify new points.
The boundary itself is rarely computed; many algorithms seek to retain only
those points necessary to generate an identical boundary.
Condensing
• Aim is to reduce the number of training samples
• Retain only the samples that are needed to define the decision boundary
• Decision Boundary Consistent – a subset whose nearest neighbour decision
boundary is identical to the boundary of the entire training set
• Minimum Consistent Set – the smallest subset of the training data that correctly
classifies all of the original training data
Original data Condensed data Minimum Consistent Set
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Condensing
• Condensed Nearest Neighbour (CNN)
Hart 1968
– Incremental
– Order dependent
– Neither minimal nor decision
boundary consistent
– O(n3) for brute-force method
– Can follow up with reduced NN
[Gates72]
• Remove a sample if doing so
does not cause any incorrect
classifications
1. Initialize subset with a single
training example
2. Classify all remaining
samples using the subset,
and transfer any incorrectly
classified samples to the
subset
3. Return to 2 until no transfers
occurred or the subset is full
Proximity Graphs
• Condensing aims to retain points along the decision boundary
• How to identify such points?
– Neighbouring points of different classes
• Proximity graphs provide various definitions of “neighbour”
   
NNG MST RNG GG DT
NNG = Nearest Neighbour Graph
MST = Minimum Spanning Tree
RNG = Relative Neighbourhood Graph
GG = Gabriel Graph
DT = Delaunay Triangulation
Proximity Graphs: Delaunay
• The Delaunay Triangulation is the dual of the
Voronoi diagram
• Three points are each others neighbours if their
tangent sphere contains no other points
• Voronoi editing: retain those points whose
neighbours (as defined by the Delaunay
Triangulation) are of the opposite class
• The decision boundary is identical
• Conservative subset
• Retains extra points
• Expensive to compute in high
dimensions
Proximity Graphs: Gabriel
• The Gabriel graph is a subset of the
Delaunay Triangulation
• Points are neighbours only if their
(diametral) sphere of influence is
empty
• Does not preserve the identical
decision boundary, but most changes
occur outside the convex hull of the
data points
• Can be computed more efficiently
Green lines denote
“Tomek links”
Proximity Graphs: RNG
• The Relative Neighbourhood Graph (RNG)
is a subset of the Gabriel graph
• Two points are neighbours if the “lune”
defined by the intersection of their radial
spheres is empty
• Further reduces the number of neighbours
• Decision boundary changes are often
drastic, and not guaranteed to be training
set consistent
Gabriel edited RNG edited – not consistent
Dataset Reduction: Editing
• Training data may contain noise, overlapping classes
– starting to make assumptions about the underlying distributions
• Editing seeks to remove noisy points and produce smooth decision
boundaries – often by retaining points far from the decision boundaries
• Results in homogenous clusters of points
Wilson Editing
• Wilson 1972
• Remove points that do not agree with the majority of their k nearest neighbours
Wilson editing with k=7
Original data
Earlier example
Wilson editing with k=7
Original data
Overlapping classes
Multi-edit
• Multi-edit [Devijer & Kittler ’79]
– Repeatedly apply Wilson editing
to random partitions
– Classify with the 1-NN rule
• Approximates the error rate of the
Bayes decision rule
1. Diffusion: divide data into N ≥
3 random subsets
2. Classification: Classify Si
using 1-NN with S(i+1)Mod N as
the training set (i = 1..N)
3. Editing: Discard all samples
incorrectly classified in (2)
4. Confusion: Pool all remaining
samples into a new set
5. Termination: If the last I
iterations produced no editing
then end; otherwise go to (1)
Multi-edit, 8 iterations – last 3 same
Voronoi editing
Combined Editing/Condensing
• First edit the data to remove noise and smooth the boundary
• Then condense to obtain a smaller subset

More Related Content

PPTX
KNN.pptx
PPTX
KNN.pptx
PDF
Lecture 6 - Classification Classification
PPTX
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
PDF
Chapter7 clustering types concepts algorithms.pdf
PDF
Lecture 8: Decision Trees & k-Nearest Neighbors
PDF
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
KNN.pptx
KNN.pptx
Lecture 6 - Classification Classification
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
Chapter7 clustering types concepts algorithms.pdf
Lecture 8: Decision Trees & k-Nearest Neighbors
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf

Similar to unit 4 nearest neighbor.ppt (20)

PPTX
Knn 160904075605-converted
PPTX
Algorithm explanations
PPT
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
PPTX
Advanced database and data mining & clustering concepts
PDF
PDF
6 clustering
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PPTX
UNIT_V_Cluster Analysis.pptx
PDF
Graph Analysis Beyond Linear Algebra
PPTX
Statistical Machine Learning unit3 lecture notes
PPTX
07. Local Search ansdsdsdsdsdd Optimization.pptx
PPTX
machine learning - Clustering in R
PPTX
SAMPATH-SEMINAR.pptx ..............................
PDF
KNN presentation.pdf
PPTX
Computational Giants_nhom.pptx
PPTX
Cluster Analysis
PDF
k-nearest neighbour Machine Learning.pdf
PPTX
k-nearest neighbour Machine Learning.pptx
Knn 160904075605-converted
Algorithm explanations
2002_Spring_CS525_Lggggggfdtfffdfgecture_2.ppt
Advanced database and data mining & clustering concepts
6 clustering
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
UNIT_V_Cluster Analysis.pptx
Graph Analysis Beyond Linear Algebra
Statistical Machine Learning unit3 lecture notes
07. Local Search ansdsdsdsdsdd Optimization.pptx
machine learning - Clustering in R
SAMPATH-SEMINAR.pptx ..............................
KNN presentation.pdf
Computational Giants_nhom.pptx
Cluster Analysis
k-nearest neighbour Machine Learning.pdf
k-nearest neighbour Machine Learning.pptx
Ad

Recently uploaded (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Trending Python Topics for Data Visualization in 2025
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
CNN LeNet5 Architecture: Neural Networks
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
AI Guide for Business Growth - Arna Softech
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
AI-Powered Fuzz Testing: The Future of QA
PPTX
Cybersecurity: Protecting the Digital World
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Trending Python Topics for Data Visualization in 2025
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
DNT Brochure 2025 – ISV Solutions @ D365
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
CNN LeNet5 Architecture: Neural Networks
How to Use SharePoint as an ISO-Compliant Document Management System
Download Adobe Photoshop Crack 2025 Free
AI Guide for Business Growth - Arna Softech
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
MCP Security Tutorial - Beginner to Advanced
AI-Powered Fuzz Testing: The Future of QA
Cybersecurity: Protecting the Digital World
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
Wondershare Recoverit Full Crack New Version (Latest 2025)
Ad

unit 4 nearest neighbor.ppt

  • 2. Nearest Neighbour Rule Non-parametric pattern classification. Consider a two class problem where each sample consists of two measurements (x,y). k = 1 k = 3 For a given query point q, assign the class of the nearest neighbour. Compute the k nearest neighbours and assign the class by majority vote.
  • 3. K-Nearest Neighbor • When we interpret records as points in a data space, we can define the concepts of neighborhood. – “Records that are close to each other live in each others neighborhood.” – “Records of the same type will be close to each other in the data space.” – i.e. customers of same type will show same behavior.
  • 4. K-Nearest Neighbor • Basic principle of K-nearest neighbor is – “Do as your neighbor Do” • If we want to predict the behavior of certain individual, we first to look at the behavior of, for example: ten individuals that are close to him/her in the data space. • We calculate the average of 10 neighbors and this average value will be the prediction for our individual. • K=> No. of Neighbors. 5-Nearest neighbor => 5 neighbors to be taken for calculation
  • 5. Nearest Neighbour Issues • Expensive – To determine the nearest neighbour of a query point q, must compute the distance to all N training examples + Pre-sort training examples into fast data structures (kd-trees) + Compute only an approximate distance (LSH) + Remove redundant data (condensing) • Storage Requirements – Must store all training data P + Remove redundant data (condensing) - Pre-sorting often increases the storage requirements • High Dimensional Data – “Curse of Dimensionality” • Required amount of training data increases exponentially with dimension • Computational cost also increases dramatically • Partitioning techniques degrade to linear search in high dimension
  • 6. Questions • What distance measure to use? – Often Euclidean distance is used – Locally adaptive metrics – More complicated with non-numeric data, or when different dimensions have different scales • Choice of k? – Cross-validation – 1-NN often performs well in practice – k-NN needed for overlapping classes – Re-label all data according to k-NN, then classify with 1-NN – Reduce k-NN problem to 1-NN through dataset editing
  • 7. Exact Nearest Neighbour • Asymptotic error (infinite sample size) is less than twice the Bayes classification error – Requires a lot of training data • Expensive for high dimensional data (d>20?) • O(Nd) complexity for both storage and query time – N is the number of training examples, d is the dimension of each sample – This can be reduced through dataset editing/condensing
  • 8. Decision Regions Each cell contains one sample, and every location within the cell is closer to that sample than to any other sample. A Voronoi diagram divides the space into such cells. Every query point will be assigned the classification of the sample within that cell. The decision boundary separates the class regions based on the 1-NN decision rule. Knowledge of this boundary is sufficient to classify new points. The boundary itself is rarely computed; many algorithms seek to retain only those points necessary to generate an identical boundary.
  • 9. Condensing • Aim is to reduce the number of training samples • Retain only the samples that are needed to define the decision boundary • Decision Boundary Consistent – a subset whose nearest neighbour decision boundary is identical to the boundary of the entire training set • Minimum Consistent Set – the smallest subset of the training data that correctly classifies all of the original training data Original data Condensed data Minimum Consistent Set
  • 10. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 11. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 12. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 13. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 14. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 15. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 16. Condensing • Condensed Nearest Neighbour (CNN) Hart 1968 – Incremental – Order dependent – Neither minimal nor decision boundary consistent – O(n3) for brute-force method – Can follow up with reduced NN [Gates72] • Remove a sample if doing so does not cause any incorrect classifications 1. Initialize subset with a single training example 2. Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3. Return to 2 until no transfers occurred or the subset is full
  • 17. Proximity Graphs • Condensing aims to retain points along the decision boundary • How to identify such points? – Neighbouring points of different classes • Proximity graphs provide various definitions of “neighbour”     NNG MST RNG GG DT NNG = Nearest Neighbour Graph MST = Minimum Spanning Tree RNG = Relative Neighbourhood Graph GG = Gabriel Graph DT = Delaunay Triangulation
  • 18. Proximity Graphs: Delaunay • The Delaunay Triangulation is the dual of the Voronoi diagram • Three points are each others neighbours if their tangent sphere contains no other points • Voronoi editing: retain those points whose neighbours (as defined by the Delaunay Triangulation) are of the opposite class • The decision boundary is identical • Conservative subset • Retains extra points • Expensive to compute in high dimensions
  • 19. Proximity Graphs: Gabriel • The Gabriel graph is a subset of the Delaunay Triangulation • Points are neighbours only if their (diametral) sphere of influence is empty • Does not preserve the identical decision boundary, but most changes occur outside the convex hull of the data points • Can be computed more efficiently Green lines denote “Tomek links”
  • 20. Proximity Graphs: RNG • The Relative Neighbourhood Graph (RNG) is a subset of the Gabriel graph • Two points are neighbours if the “lune” defined by the intersection of their radial spheres is empty • Further reduces the number of neighbours • Decision boundary changes are often drastic, and not guaranteed to be training set consistent Gabriel edited RNG edited – not consistent
  • 21. Dataset Reduction: Editing • Training data may contain noise, overlapping classes – starting to make assumptions about the underlying distributions • Editing seeks to remove noisy points and produce smooth decision boundaries – often by retaining points far from the decision boundaries • Results in homogenous clusters of points
  • 22. Wilson Editing • Wilson 1972 • Remove points that do not agree with the majority of their k nearest neighbours Wilson editing with k=7 Original data Earlier example Wilson editing with k=7 Original data Overlapping classes
  • 23. Multi-edit • Multi-edit [Devijer & Kittler ’79] – Repeatedly apply Wilson editing to random partitions – Classify with the 1-NN rule • Approximates the error rate of the Bayes decision rule 1. Diffusion: divide data into N ≥ 3 random subsets 2. Classification: Classify Si using 1-NN with S(i+1)Mod N as the training set (i = 1..N) 3. Editing: Discard all samples incorrectly classified in (2) 4. Confusion: Pool all remaining samples into a new set 5. Termination: If the last I iterations produced no editing then end; otherwise go to (1) Multi-edit, 8 iterations – last 3 same Voronoi editing
  • 24. Combined Editing/Condensing • First edit the data to remove noise and smooth the boundary • Then condense to obtain a smaller subset