I get the output below after applying PCA on my training set. Does this output mean that I
have to select the first 8 attributes in the original training set?
0.4123 1 0.378att_3+0.377att_4+0.339att_9+0.339att_7+0.331att_8...
0.3133 2 0.972att_1+0.173att_10-0.098att_5-0.091att_7-0.051att_4...
0.2365 3 -0.914att_10+0.224att_8+0.191att_7+0.19 att_1+0.117att_2...
0.1827 4 0.64 att_5-0.429att_6+0.334att_8-0.327att_2-0.262att_4...
0.1357 5 0.71 att_2-0.47att_6-0.448att_9+0.182att_7+0.135att_10...
0.0971 6 -0.547att_7+0.483att_8+0.418att_9-0.379att_6-0.268att_5...
0.0627 7 0.668att_8-0.495att_9+0.378att_6-0.276att_4-0.21att_5...
0.0333 8 -0.651att_7+0.513att_5-0.397att_9+0.304att_3+0.153att_4...
Selected attributes: 1,2,3,4,5,6,7,8 : 8
"Harri M.T. Saarikoski" <harri.saarikoski(a)helsinki.fi> wrote:
Lainaus "M. Fatih Akay" :
1. I would like to try some feature selection algorithms such as PCA. I
want to know whether I should apply PCA on the training set or on the
2. As far as I see, PCA always returns the FIRST N attributes, where N
<= no. of attributes. It means the first attribute is the highest in
ranking, the second attribute is the second in ranking etc. For example,
if I have 6 features, PCA can return the first five. If I have 12
features, PCA can returns the first 10. (i.e. always the FIRST N
features). Why is this the case for PCA?
Because PCA as its close relative (singular value decomposition SVD) do not
use the original *input space* vectors to rank its features, but attempt to
combine the most similar ones of those input vectors and form their own new
*feature space* vectors. Because of this combination they make their class
decisions based on fewer (but stronger) features than the other 'regular'
classifiers. In classifiers outputting feature names (most) in their output
model (in Explorer: Classify tab -> More options -> tick Output model), you
can see the names of these merged features. It also appears that Weka's PCA
allows to set the number of feature space attributes (maximumAttributeNames
Check eigenvalues in wikipedia if you want more detail about what PCA's
statistical basis is for this feature reduction.
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo!
yst. terv | Best wishes
Wekalist mailing list
Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo!