SlideShare a Scribd company logo
PCA Understanding Document
Theory :
Let the data points be the following on which PCA will be applied.
X Y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9
Subtract the mean from the dataset from each of the individual axes.The modified dataset is :
X Y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
Calculate the covariance matrix :
ccv X Y
X 0.61655556 0.615444444
Y 0.615444444 0.71655556
Calculate the eigenvalues and the eigenvectors.
eigenvalues
0.490833989
1.28402771
eigenvector 1 eigenvector 2
-0.735178656 -0.677873399
0.677873399 -0.735178656
The eigenvector with the highest eigenvalue is the principal component of the data set.
Once eigenvectors are found from the covariance matrix, the next step is to order them by
eigenvalue, highest to lowest. This gives you the components in order of significance. Now, if you
like, you can decide to ignore the components of lesser significance. You do lose some information,
but if the eigenvalues are small, you don’t lose much. If you leave out some components, the final
data set will have less dimensions than the original. To be precise, if you originally have dimensions
in your data, and so you calculate eigenvectors and eigenvalues, and then you choose only the first
eigenvectors, then the final data set has only dimensions.
FeatureVector = [eig1 eig2 eig3.....]
eigenvector 1 eigenvector 2
-0.677873399 -0.735178656
0.735178656 -0.677873399
We can choose to leave out the smaller, less significant component and only have a single column:
eigenvector 1
-0.677873399
0.735178656
FinalData = RowFeatureVector * RowDataAdjust
where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the
eigenvectors are now in the rows, with the most significant eigenvector at the top and
RowDataAdjust is the mean-adjusted data transposed ie. the data items are in each column, with each
row holding a separate dimension.
X Y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
Transformed Data (Single eigenvector)
X
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
Eg .
We have n features
FinalData = SampleData(1*n matrix) * Eigen Vector (n*1 matrix)
= 1*1 Matrix i.e( 1st eigen vector of n eigen vectors which are in descending order according to
its eigen values is used to get 1st value for features after PCA execution.)
To get the Final Data :
FinalData = RowFeatureVector * RowDataAdjust
Getting back old Data :
RowDataAdjust = RowFeatureVector^(-1) * Final Data
Java Library :
1 . java-statistical-analysis-tool: (JSAT)
https://code.google.com/p/java-statistical-analysis-
tool/source/browse/trunk/JSAT/src/jsat/datatransform/PCA.java?spec=svn414&r=414
 License : GNU GPL v3
2. efficient-java-matrix-library: (EJML)
https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample
 License : GNU Lesser GPL
3. Michael Thomas Flanagan's Java Scientific Library
http://www.ee.ucl.ac.uk/~mflanaga/java/PCA.html
License : This library is no longer publicly available
Here we can commercially use efficient-java-matrix-library: (EJML) :
Explaining EJML :
Here is the code which you can use after adding ejml jar to the classpath :
https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample
We can write a test component class for this class.
Process:
1. First we have to provide all the data sample by
pca.addSample(sample);
2. Then we have to call : pca.computeBasis(n);
It actually is the main component , here n is the number to which we want our feature to reduce to.
3. Now we can use eigen vectors created to actually get values using function : sampleToEigenSpace(
double[] sampleData )
Points to Note :
PCA will not be pretty useful with the data having 0's and 1's as the data having this feature can be
easily converted to sparse matrix format which will automatically reduces your memory req.
The PCA o/p will never be useful to convert it into sparse matrix format as it will not contain 0's .
So its better not to use PCA if we have data having 0's and 1's.
(We didn't got any java library to give sparse matrix as input format to PCA)
Links :
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

More Related Content

What's hot (20)

PPTX
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
PPTX
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PDF
PCA (Principal component analysis)
Learnbay Datascience
 
PPTX
Supervised and unsupervised learning
Paras Kohli
 
PDF
Introduction to Random Forest
Rupak Roy
 
PPTX
Logistic regression
DrZahid Khan
 
PPTX
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Bulbul Agrawal
 
PPTX
Logistic regression
YashwantGahlot1
 
PDF
Principal component analysis and lda
Suresh Pokharel
 
PPTX
Logistic regression
DrZahid Khan
 
PPTX
Housing price prediction
Abhimanyu Dwivedi
 
PPTX
House Sale Price Prediction
sriram30691
 
PDF
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
PDF
Linear regression
MartinHogg9
 
PPTX
Introduction to principal component analysis (pca)
Mohammed Musah
 
PDF
Class imbalance problem1
chs71
 
PPT
Artificial Neural Networks - ANN
Mohamed Talaat
 
PDF
Linear regression theory
Saurav Mukherjee
 
PDF
Missing data handling
QuantUniversity
 
PPTX
Linear Regression and Logistic Regression in ML
Kumud Arora
 
Introduction to Maximum Likelihood Estimator
Amir Al-Ansary
 
Introduction to Linear Discriminant Analysis
Jaclyn Kokx
 
PCA (Principal component analysis)
Learnbay Datascience
 
Supervised and unsupervised learning
Paras Kohli
 
Introduction to Random Forest
Rupak Roy
 
Logistic regression
DrZahid Khan
 
Age Estimation And Gender Prediction Using Convolutional Neural Network.pptx
Bulbul Agrawal
 
Logistic regression
YashwantGahlot1
 
Principal component analysis and lda
Suresh Pokharel
 
Logistic regression
DrZahid Khan
 
Housing price prediction
Abhimanyu Dwivedi
 
House Sale Price Prediction
sriram30691
 
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
Linear regression
MartinHogg9
 
Introduction to principal component analysis (pca)
Mohammed Musah
 
Class imbalance problem1
chs71
 
Artificial Neural Networks - ANN
Mohamed Talaat
 
Linear regression theory
Saurav Mukherjee
 
Missing data handling
QuantUniversity
 
Linear Regression and Logistic Regression in ML
Kumud Arora
 

Viewers also liked (10)

PPTX
MVFI Meeting (January 14th, 2011)
ivangomezconde
 
PDF
Text Analytics Online Knowledge Base / Database
Naveen Kumar
 
PPTX
Characterization of a dielectric barrier discharge (DBD) for waste gas treatment
Devansh Sharma
 
PPTX
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Naveen Kumar
 
PDF
Understanding Mahout classification documentation
Naveen Kumar
 
PDF
Regularized Principal Component Analysis for Spatial Data
Wen-Ting Wang
 
PPTX
Principal component analysis
Farah M. Altufaili
 
PPTX
Steps for Principal Component Analysis (pca) using ERDAS software
Swetha A
 
PPTX
Application of Principal Components Analysis in Quality Control Problem
MaxwellWiesler
 
MVFI Meeting (January 14th, 2011)
ivangomezconde
 
Text Analytics Online Knowledge Base / Database
Naveen Kumar
 
Characterization of a dielectric barrier discharge (DBD) for waste gas treatment
Devansh Sharma
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Naveen Kumar
 
Understanding Mahout classification documentation
Naveen Kumar
 
Regularized Principal Component Analysis for Spatial Data
Wen-Ting Wang
 
Principal component analysis
Farah M. Altufaili
 
Steps for Principal Component Analysis (pca) using ERDAS software
Swetha A
 
Application of Principal Components Analysis in Quality Control Problem
MaxwellWiesler
 
Ad

Similar to Principal Component Analysis(PCA) understanding document (20)

PDF
Covariance.pdf
ManojKumarPal22
 
PDF
Pca ankita dubey
Ankita Dubey
 
PDF
pca.pdf polymer nanoparticles and sensors
vincyshamleyeben
 
PPTX
pcappt-140121072949-phpapp01.pptx
ABINASHPADHY6
 
PDF
5 DimensionalityReduction.pdf
Rahul926331
 
PPTX
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
shafanahmad06
 
PPTX
Feature selection using PCA.pptx
beherasushree212
 
PDF
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
PPTX
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
sushmitjivtode21
 
PDF
Principal Components Analysis, Calculation and Visualization
Marjan Sterjev
 
PDF
overviewPCA
Edwin Heredia
 
PPTX
DimensionalityReduction.pptx
36rajneekant
 
PDF
Mathematical Introduction to Principal Components Analysis
Unchitta Kan
 
PDF
Soham Patra_13000120121.pdf
PritamDutta66
 
PDF
PCA for the uninitiated
Ben Mabey
 
PDF
Pca ppt
Alaa Tharwat
 
PPTX
PCA and SVD in brief
N. I. Md. Ashafuddula
 
DOCX
Principal Component Analysis
Mason Ziemer
 
PPTX
Principal component analysis in machine L
satyanarayana242612
 
PDF
Principal Component Analysis
Sumit Singh
 
Covariance.pdf
ManojKumarPal22
 
Pca ankita dubey
Ankita Dubey
 
pca.pdf polymer nanoparticles and sensors
vincyshamleyeben
 
pcappt-140121072949-phpapp01.pptx
ABINASHPADHY6
 
5 DimensionalityReduction.pdf
Rahul926331
 
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
shafanahmad06
 
Feature selection using PCA.pptx
beherasushree212
 
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
sushmitjivtode21
 
Principal Components Analysis, Calculation and Visualization
Marjan Sterjev
 
overviewPCA
Edwin Heredia
 
DimensionalityReduction.pptx
36rajneekant
 
Mathematical Introduction to Principal Components Analysis
Unchitta Kan
 
Soham Patra_13000120121.pdf
PritamDutta66
 
PCA for the uninitiated
Ben Mabey
 
Pca ppt
Alaa Tharwat
 
PCA and SVD in brief
N. I. Md. Ashafuddula
 
Principal Component Analysis
Mason Ziemer
 
Principal component analysis in machine L
satyanarayana242612
 
Principal Component Analysis
Sumit Singh
 
Ad

Recently uploaded (20)

PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 

Principal Component Analysis(PCA) understanding document

  • 1. PCA Understanding Document Theory : Let the data points be the following on which PCA will be applied. X Y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 Subtract the mean from the dataset from each of the individual axes.The modified dataset is : X Y .69 .49 -1.31 -1.21 .39 .99 .09 .29 1.29 1.09 .49 79 .19 -.31 -.81 -.81 -.31 -.31 -.71 -1.01
  • 2. Calculate the covariance matrix : ccv X Y X 0.61655556 0.615444444 Y 0.615444444 0.71655556 Calculate the eigenvalues and the eigenvectors. eigenvalues 0.490833989 1.28402771 eigenvector 1 eigenvector 2 -0.735178656 -0.677873399 0.677873399 -0.735178656 The eigenvector with the highest eigenvalue is the principal component of the data set. Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance. Now, if you like, you can decide to ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much. If you leave out some components, the final data set will have less dimensions than the original. To be precise, if you originally have dimensions in your data, and so you calculate eigenvectors and eigenvalues, and then you choose only the first eigenvectors, then the final data set has only dimensions. FeatureVector = [eig1 eig2 eig3.....] eigenvector 1 eigenvector 2 -0.677873399 -0.735178656 0.735178656 -0.677873399 We can choose to leave out the smaller, less significant component and only have a single column: eigenvector 1 -0.677873399
  • 3. 0.735178656 FinalData = RowFeatureVector * RowDataAdjust where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top and RowDataAdjust is the mean-adjusted data transposed ie. the data items are in each column, with each row holding a separate dimension. X Y -.827970186 -.175115307 1.77758033 .142857227 -.992197494 .384374989 -.274210416 .130417207 -1.67580142 -.209498461 -.912949103 .175282444 .0991094375 -.349824698 1.14457216 .0464172582 .438046137 .0177646297 1.22382056 -.162675287 Transformed Data (Single eigenvector) X -.827970186 1.77758033 -.992197494 -.274210416 -1.67580142 -.912949103 .0991094375 1.14457216
  • 4. .438046137 1.22382056 Eg . We have n features FinalData = SampleData(1*n matrix) * Eigen Vector (n*1 matrix) = 1*1 Matrix i.e( 1st eigen vector of n eigen vectors which are in descending order according to its eigen values is used to get 1st value for features after PCA execution.) To get the Final Data : FinalData = RowFeatureVector * RowDataAdjust Getting back old Data : RowDataAdjust = RowFeatureVector^(-1) * Final Data Java Library : 1 . java-statistical-analysis-tool: (JSAT) https://code.google.com/p/java-statistical-analysis- tool/source/browse/trunk/JSAT/src/jsat/datatransform/PCA.java?spec=svn414&r=414  License : GNU GPL v3 2. efficient-java-matrix-library: (EJML) https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample  License : GNU Lesser GPL 3. Michael Thomas Flanagan's Java Scientific Library http://www.ee.ucl.ac.uk/~mflanaga/java/PCA.html License : This library is no longer publicly available
  • 5. Here we can commercially use efficient-java-matrix-library: (EJML) : Explaining EJML : Here is the code which you can use after adding ejml jar to the classpath : https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample We can write a test component class for this class. Process: 1. First we have to provide all the data sample by pca.addSample(sample); 2. Then we have to call : pca.computeBasis(n); It actually is the main component , here n is the number to which we want our feature to reduce to. 3. Now we can use eigen vectors created to actually get values using function : sampleToEigenSpace( double[] sampleData ) Points to Note : PCA will not be pretty useful with the data having 0's and 1's as the data having this feature can be easily converted to sparse matrix format which will automatically reduces your memory req. The PCA o/p will never be useful to convert it into sparse matrix format as it will not contain 0's . So its better not to use PCA if we have data having 0's and 1's. (We didn't got any java library to give sparse matrix as input format to PCA) Links : http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf