I am currently looking at classifying a published dataset using Weka. Using kNN, Linear
regression or J48 I see a good LOO CV accuracy in samples used as a trainging set, however
very poor when the algortim is evaluated using a test set.
I am selecting attributes using the wrapper selection process and either 10x or LOO CV
which I thought was a method of reducing over fitting.
Generally between 10-30 features (genes in this case) are selected from several hunderd
and perform with >90% accuracy on the training set, however only ~50% of samples in a
small test set (prepared in same way) are correctly assigned.
Does anyone have any ideas on what I could be doing that is resulting in this
Thanks in advance,
Ryan van Laar
<http://www.ccgpm.org/> Ryan van Laar r.vanlaar(a)pmci.unimelb.edu.au
Bioinformatics PhD Student
The Peter <http://www.pmci.unimelb.edu.au/> MacCallum Cancer Institute - Microarray
St. Andrews Place, East Melbourne, VIC Australia 3002
Ph: 03 9656 1790, Fax: 03 9656 1460, Mobile: 0402 101 262
Show replies by date