I have a couple of questions:
- Assuming you have a large set of features (20+ attributes), and the
features are precise. Does that mean the more the features the better the
classification rate ?
The size of the data, especially if large, will let the classifier trains
different cases of instances which in turn lead to have general (but
maybe not best) prediction results.
- Assume that i reduced the number of features from 20 attributes to 2
attributes by generating one unique feature from each 10 attributes. Does
that mean i would have the same results as if i had 20 attributes ?
No, in general cases. However, this can be accomplished if you have a very
good learner from the one hand, and the way you configure the parameters of
your schemes from the other hand. Also it depends on the data you have.
- I am currently doing experiments on large files (ARFF file is over
150MB). It contains over 4 million instance. The file have only 3
attributes. The nature of my data forces me to train with such files. Is
there any solution instead of sampling (random selection), cause this will
break my data, since i am working with linguistic features.
You may reduce the size of your data using filter method. This can be done
with the help of "weka.attributeSelection.InfoGainAttributeEval" in
conjunction with "Ranker" as a search method. On the other hand, there is
another way which we call it wrapper method.
"weka.attributeSelection.ClassifierSubsetEval" with e.g., GreedyStepwise
can do this work.
Wekalist mailing list
Send posts to: Wekalist(a)list.waikato.ac.nz
List info and subscription status: