SMOTE should be working correctly. The FilteredClassifier creates a new training set using SMOTE before passing it to the base classifier. Test instances will be passed through by SMOTE unaltered though, so it will appear from evaluation results that there has been no change to the total number of instances.
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
I seem to have run into a problem using SMOTE with weka knowlege flow (version 3.7.12).
I have an unbalanced dataset, so I'm channeling my dataset into a filtered classifier. My filtered classifier uses multiple algorithms, but each of them has SMOTE specified as a filter. However, it seems that SMOTE is not generating synthetic data no matter how I set the boosting, and I end up with the same number of rows that I originally fed into the knowledge flow.
I'll be happy to share the full knowledge flow process with anyone who wants to see, but basically, what i'm doing is:
Options: -F "weka.filters.supervised.instance.SMOTE -C 0 -K 5 -P 150.0 -S 1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Relation: procedures_outcomes_ca-weka.filters.unsupervised.attribute.ClassAssigner-Clast-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.InfoGainAttributeEval-Sweka.attributeSelection.Ranker -T -1.7976931348623157E308 -N 1500
On the other hand, SMOTE works fine if I specify it before the filtered classifier (i.e. apply SMOTE first, and then feed the boosted data into the filtered classifier).