Thanks for your reply. Yes, the three subsets are selected at random from the overall dataset.
Date: Sat, 16 Dec 2017 18:42:36 +1300
From: Eibe Frank <email@example.com>
To: Weka machine learning workbench list.
Subject: Re: [Wekalist] Select attributes - why are the rankings
different forsubsets of the same dataset?
Content-Type: text/plain; charset="utf-8"
Have you shuffled the data before you created the three subsets? The Randomize filter in WEKA can be used for that. Alternatively, you can use the RemoveFolds filter (configuring it for a three-fold cross-validation).
From: Ronan Flynn
Sent: Saturday, 16 December 2017 12:50 AM
Subject: [Wekalist] Select attributes - why are the rankings different forsubsets of the same dataset?
I have a speech dataset that is divided into three subsets. There are approximately 90 attributes and the target is a numerical correlation value. I want to rank the attributes and have used the following:
Evaluator:??? weka.attributeSelection.WrapperSubsetEval -B weka.classifiers.functions.SMOreg -F 5 -T 0.01 -R 1 -E CORR-COEFF -- -C 0.0302 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Search:?????? weka.attributeSelection.GreedyStepwise -R -T -1.7976931348623157E308 -N -1 -num-slots 1
When I run the attribute selection on each of the three speech subsets I get three very different ranked lists. I would have expected the rankings for the three subsets to be similar given that they are taken from the same overall speech dataset. Can anyone suggest possible reasons as to why the rankings are so different for each of the three speech subsets?
Also, is it possible when doing the ranking to output the correlation for each attribute individually? I would like to see the correlation for the individual attributes.
Regards and thanks,