Principal Component Analysis For Novelty Detection

Principal Component Analysis for
Novelty Detection
A journal article submitted to and accepted by Pattern Recognition Letters

Jordan McBain, P.Eng.
Markus Timusk, PhD, P.Eng.

Condition Monitoring
 Maintenance technique
 Maintenance undertaken when some indicator of health is
flagged
 Advanced technique employed when cost-benefit analysis
justifies the expense of monitoring equipment
 Alternative to run-to-failure maintenance and statistically
determined time-based maintenance
 Employ pattern recognition to automate diagnosis
 Expert system employed to replicate technicians
maintenance insight
 Computer and sensors replaces technician and screw driver set
atop vibrating machine – the nature of the vibration used to
discern state

Pattern Recognition
 Equality insufficient means of classifying real-world
members of class (noise, variance, etc)
 Pattern recognition
 Real-world signals presumed to be representative of class
reduced to representative n-dimensional feature vectors
 Plotted in N-dimensional space
 Decision boundary generated with pattern recognition
techniques
 Employed as classification rule
 Problems
 Choice of features
 How representative?
 Maximize number of features?
 Curse of dimensionality
 Imbalance of data

Principal Component Analysis
 One technique used to find “optimal” set of features
 Finds the axes of normally distributed data
 Select the largest axes and omit smaller ones to define
new basis
 Project data onto basis to reduce dimensionality of
problem space
 Each feature presumed to be normally distributed

 N-dimensional scattering of features presumed
independent
 Combined probability:
P( A B) P( A)* P( B)

d d 1 xi i 2
 1 2
( )
p( x ) p ( xi ) e i

i 1 i 1 2 i
d
1 x i 2 1   t  
( i )
1 2i1 1 2
(x ) 1
(x )
d
e i
e
d
(2 ) d (2 ) | |
i
i 1
 Find principal components
(i.e. axes of hyper-ellipsoidal
distribution)
 Select maximum variance
(largest axes)
 Eigenvalue problem
 Eigenvectors – principle
components
 Eigenvalues – size of
axis

Novelty Detection
 Deals with imbalance of data between classes
 Fault detection in machinery
 Easy to collect data representative of healthy state
 Difficult to collect data representative of faulted states
 Costly to break machinery
 Operationally unacceptable
 Poor database of faults kept
 Can never capture them all!
 Model healthy data with decision boundary
 If test patterns fall outside, classify as a fault!

Problem
 PCA is best for selecting a subspace that best
represents the data
 In pattern recognition, we seek to discriminant
between classes
 Objective of most feature reduction techniques are
not optimized for novelty detection

Feature Reduction Techniques
 Feature Selection vs. Feature Extraction
 Selection
 Choosing small subsets of features that are adequate to
describe classes
 E.g. “Search”
 Examines all subsets of feature combinations to find the one which
maximizes some objective function
 May employ classifier error as objective function
 Exponential explosion
 Heuristics to mitigate possible
 If computationally feasible, gives the best results
 Extraction
 Computes a small number of new features form the set of old
features
 E.g. PCA

Principal Component Analysis
 Seeks a subspace in which the data representation
error is minimal
 Development
 For a set of n vectors in d-dimensional space
 seek the equation of a hyper plane onto which the data may be
projected with minimal representation error
 Hyper plane fixed at the data’s mean, m
 Hyper plane’s orientation defined by direction vector, w (normal
definition of a plane)

 Derive error function

 Optimization problem well known eigenvalue
problem
 Resultant feature space is linear
 May not represent non-linear and changing data well
 Kernel PCA and Dynamic PCA
 Techniques only suitable for representing data not
discriminating between them

Source: Duda, 2000

Multiple Discriminant Analysis
 Seeks to find efficient subspaces for discrimination
rather than representation
 Development
 Two class problem with d-dimensional set of n-vectors
grouped into D1 and D2
 Projected onto some direction vector w to give

 Consequently grouped into subsets Y1 and Y 2
 Find the direction vector w such that the distance
between projected sample means m1 and m2 is
maximized
 Rationalize the distance against the relative sample size

 Reduces to

 Solution is described as “analogous to the well known
Rayleigh quotient:”
  
1
w S w (m1 m2 )

 Technique extended for problems with n-classes
 Objective to maximize the spread between all classes in the
projected space

Source: Duda, 2000

Extraction for Novelty Detection

Development
 Objective: distinguish between normal and abnormal
classes
 KFDA inappropriate (assumes classes group well into
separate classes)
 Novelty detection – classes may cluster well but abnormal
classes expected to orbit the normal data
 Means could overlap
 Eliminating previous objective functions
 Approach: find the subspace maximizing difference
between average spread of the normal class and
average spread of the abnormal class measured
from the mean of the normal class

 Mathematically, for an outlier class containing b
elements and target class containing a-elements
with mean m_t

 To simplify, introduce outlier scatter matrix, O, for
outlier data centered at m_t

 Reducing to

 Maximize this objective function
 Find the eigenvectors and eigenvalues of the matrix St-O
 Select the first k largest eigenvalues and use
corresponding eigenvectors as new basis
 Project data onto new basis
 Proceed with classification
 Limitations
 Still dependant on assumption of normal data distribution
 (as are other PCA techniques)
 Assumption: normal data scatter somewhat circularly and
outlier data orbit nicely without intruding
 (as with PCA and MDA )
 Machinery vibration data are not normally Gaussian (heuristic)

Validation: Artificial Data
 Artificial 3-d data set
 Normal distribution:
 spherical (radius 50) centered at origin
 Outlier distribution:
 randomly generated spherical distribution (radius 100)
 Not permitted to fall within cylinder concentric with the normal
data’s sphere and oriented with length parallel to [1,1,1]

Validation: Artifical Data
 Results (reduced to 2 dimensions)
 Subspace’s normal vector only 7 degrees off from
expected [1,1,1]

Apparatus
 Spectraquest gear dynamics simulator
 3-hp motor
 Magnetic particle brake loading
 National Instruments PXI data acquisition and control
 Accelerometers (sampling 4kHz)

Faults
 4 motors employed
 healthy
 Combo bearing faults
 Broken rotor bars
 Rotor unbalance
 Gearbox faults
 Fault-free conditions
 Missing tooth gear
 Chipped tooth
 Bearing with outer race faults
 Bearings with inner and outer race faults

Feature Extraction
 Autoregressive model
 a model of a statistical process generated by regressing
previous values of that statistical process with itself
 Sampling of sampled signal that best represents the
original sampling
 Order 10

Segmentation
 Vibration data segmented into groups based on
intervals with constant number of shaft rotations
 Gaussian Window
 70% overlap between segments

Results: Kernel FDA
 N.B. Potential for singular matrices

Results: Exhaustive Feature Search

Feature Extraction in the
Absence of Outliers

Motivation and Development
 The above violates assumption of novelty detection
 Limited data from fault classes
 In the case where we know nothing of the outlier
classes
 Work with what we have: normal data
 Minimize variance of normal data

Results: Novelty Reduction (Outlier
Absence)

Conclusions
 Reduce a large feature space to a smaller one
 Mitigate the curse of dimensionality
 Objective function tweaked for novelty detection
 Similar to MDA but modified to accommodate case
where normal and outlier means are closely
separated
 Results good for artificial and machinery data
 Future work
 Extend technique with kernels
 Difficult problem due to need for mean
 Thanks
 CEMI
 Dr. Mechefske, Queens

Principal Component Analysis For Novelty Detection

More Related Content

What's hot

Viewers also liked

Similar to Principal Component Analysis For Novelty Detection

Principal Component Analysis For Novelty Detection