RE: [Wekalist] Question on PCA
by Nanda, Subrat (GE, Research)
1. Answering the question 'How much more variance does Ration 2 explain vs Ratio 1', well can have different interpretations. The eigen values obtained by getting the eigen value decomposition of the covariance matrix varaibles are indicative of the variance explained by each of them. (And note that if you are calculating the PCs from covariance matrix or the correltation matrix will have different interpretations.). The sum of all eigen values can be any continuous number and reflects the total population variance.
Comparing two principal components can be done in various ways like:
a) Calculating the ratios of the two eigen values in question might give an idea of the relative variance explained by one pc vs the other.
b) Ratio of each eigen value to the 'total variance' ie the sum of the diagonal elements of the eigen values matrix can be similar.
c) A practical but tedious method to compare two PCs is to try reconstructing the original data from using each PC at a time, the relative error differnce in the two circumstances will be a good measure on evaluting these pc's.
2. I am not quite sure as to why the last PC has not been reported. Usually the the last few PC have negligible values ~0, and hence are rejected. Check with Weka guys if it automatically does this if a particular PC has zero value or some very low insignificant value. The PC have been ranked in decreasing order of the variance that they explain. It not a question of 'subsuming', but that there wernt enough 'axis' or eigen vectors after the first few with a significant eigen value to project the data, hence had a zero variance! And remember, you loose the correspondence between original space and transformed space as you perform PCA on variance-covariance data and not the original one!
Hope this helps,
From: wekalist-bounces(a)list.scms.waikato.ac.nz [mailto:email@example.com]On Behalf Of Arjun Khanna
Sent: Friday, April 28, 2006 8:58 AM
Subject: [Wekalist] Question on PCA
Hi: I am running PCA on 6 variables. My goals are two fold (1) Obviously, I'd like to understand how many PC's are required to explain say 95% of the variance but I am also (2) interested in getting a sense, or better still an empirical, understanding of how important my original attributes are.
I present my weak output below. The output 'Ranking' of the attributes appears as the final part of the output. I ran pca with 'transformBackToOriginal' set to true.
Two questions that I could use your help on:
(1)By looking at the data output I can tell that the RATIO2 explains more of the variance than RATIO1. But how do I answer the question "How much more variance does RATIO2" explain compared to RATIO1? Or how do I arrive at an estimate of that?
(2) I see that RATIO6 disappears. It does not appear in the correlation matrix and it does not appear in the eigen vectors. I verified that the maximumAttributeNames switch in PCA is set to 6. So I am assuming that RATIO6 is not necessary in terms of explaining the data variance. That may be so- but I'd like to understand what RATIOS are subsuming RATIO6. How do I tell that?
Thanks in advance. Output below.
=== Run information ===
Evaluator: weka.attributeSelection.PrincipalComponents -R 0.95 -O
Search: weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1
Evaluation mode: evaluate on all training data
=== Attribute Selection on all input data ===
Attribute Evaluator (unsupervised):
Principal Components Attribute Transformer
1 0.16 0.02 0.07 0.24
0.16 1 0.19 0.07 0.33
0.02 0.19 1 -0.06 0.08
0.07 0.07 -0.06 1 0.14
0.24 0.33 0.08 0.14 1
eigenvalue proportion cumulative
1.5835 0.3167 0.3167 0.597RATIO5+0.572RATIO2+0.435RATIO1+0.26 RATIO3+0.243RATIO4
1.08849 0.2177 0.5344 -0.709RATIO3+0.626RATIO4-0.219RATIO2+0.212RATIO1+0.11 RATIO5
0.90369 0.18074 0.71514 -0.674RATIO1+0.65 RATIO4+0.324RATIO3+0.131RATIO2-0.04RATIO5
0.78667 0.15733 0.87247 0.539RATIO3+0.532RATIO1-0.424RATIO2-0.357RATIO5+0.345RATIO4
0.63765 0.12753 1 0.708RATIO5-0.654RATIO2+0.181RATIO3-0.17RATIO1-0.091RATIO4
V1 V2 V3 V4 V5
0.4352 0.2124 -0.6737 0.5316 -0.1703 RATIO1
0.5719 -0.2193 0.1312 -0.4238 -0.6542 RATIO2
0.2604 -0.7094 0.3244 0.5394 0.181 RATIO3
0.2429 0.6257 0.6496 0.3453 -0.0908 RATIO4
0.5973 0.11 -0.0404 -0.3572 0.7085 RATIO5
PC space transformed back to original space.
(Note: can't evaluate attributes in the original space)
1 2 RATIO2
1 1 RATIO1
1 5 RATIO5
1 3 RATIO3
1 4 RATIO4
Selected attributes: 2,1,5,3,4 : 5