It's best to stick to the FilteredClassifier. It's not slow: it has to build the
filter model 10 times in a 10-fold cross-validation (10 times 2 equals 20).
On 12/03/2016, at 12:53 AM,
<stotz(a)sit.fraunhofer.de> <stotz(a)sit.fraunhofer.de> wrote:
I am working on a multi-class classification program based on a textual description.
The structure of Weka (3.6) makes it relative simple to implement this, once I understood
the Weka structure and naming scheme.
Until now I used a FilteredClassifier and a StringToWordVector. While testing out several
different options on the StringToWordVector filter in the Weka GUI I realized that using a
FilteredClassifier really slows down the process of building and evaluating a classifier.
Especially the a 10 fold cross evaluation is painfully slow using a FilteredClassifier:
More than 20 minutes using the FilteredClassifier in comparison to ~2 minutes the plain
classifier with the filter applied to the training data (5000 instances, 23 classes).
Therefore I tried get rid of the FilteredClassifier, always filtering the training data
and the to-be-classified-instance. Training and evaluation is now much faster, but
performing a classification on the trained classifier now makes problems (tries to access
I have read in the JavaDoc of FilteredClassifier that "the structure of the filter
is based exclusively on the training data". Looking at the source code of
FilteredClassifier I did not find a point that might set "the structure of the
Has anybody some hints for me? Or do I have to stick with the slow but working
Wekalist mailing list
Send posts to: Wekalist(a)list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html