On 19 Apr 2015, at 10:07, arvega
Thank you for this clear instruction! Just need a further clarification (I'm
a newbie): since I'm evaluating the classification with the test set,
wouldn't the *StringToWordVector* filter on the full train set throw kinda
error because of using words present in the test set but not in the train
If you use the FilteredClassifier, the filter model will be built based on the training
Let me please to ask one more question: I'm trying
to classify a test set,
which content is known very superficially and classes are not predefined,
with the class-labels relevant for the train set (4 classes corresponding to
the 4 topics covered by the set instances). What would be the "standard"
approach to the classification in such case? (Particularly, how to set the
class attributes for the test set by advance ? Could be the "Weather Data"
with the string attributes and nominal classes, covered in the /Data Mining
Practical Machine Learning/, an example to follow?)
Add the same class attribute to the test data, but give every instance a missing value as
the class value ("?").