I have a question concerning the evauation of a
classifier on a test-set
with String values.
In the JavaDoc of Evaluation.evaluateModel there stands that
"...Note that the data must have exactly the same format (e.g. order of
attributes) as the data used to train the classifier! Otherwise the
results will generally be meaningless."
What does this mean on String-Attributes that were filtered with the
Does it never have the same format because there cannot be the same
words in the training corpus and the test corpus?
That's why you have to initialize a filter with the training set and
then use this initialized filter to process the test set as well. This
process is called "batch filtering". I've written a short Wiki article
that explains batch filtering for the commandline (also links to an
article that explains performing batch filtering from Java):
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
Ph. +64 (7) 858-5174