Thank you Mark,

So using knowledge flow, i've been trying out two different classification approaches,

Approach (1): Using the above approach - SMOTE happens inside the filtered classifier, and not separately
Approach (2): Using SMOTE first, and then feeding the boosted data into a filtered classifier

The evaluation results for the 1st approach shows me the same n number of rows that I started with, but according to you, thats fine. The 2nd approach on the other hand, gives me an evaluation matrix with the boosted number of rows.

Comparing the results from approaches 1 and 2, I see that approach 2 gives much better results.

From a previous email conversation (http://weka.8497.n7.nabble.com/Evaluating-which-decision-model-performs-best-td35537.html), Eibe told me that approach no. 01 (using SMOTE in conjunction with filtered classifier) is the right way to do it...
However, it seems that this approach is giving me weak results.

Can anyone comment on why approach 2 is wrong? and if so, how I can improve the results produced by approach no. 01? :) :)


Best regards,
Suranga



On Thu, Nov 12, 2015 at 3:06 PM, Mark Hall <mhall@waikato.ac.nz> wrote:
SMOTE should be working correctly. The FilteredClassifier creates a new training set using SMOTE before passing it to the base classifier. Test instances will be passed through by SMOTE unaltered though, so it will appear from evaluation results that there has been no change to the total number of instances.

Cheers,
Mark.

From: <wekalist-bounces@list.waikato.ac.nz> on behalf of Suranga Kasthurirathne <surangakas@gmail.com>
Reply-To: "Weka machine learning workbench list." <wekalist@list.waikato.ac.nz>
Date: Tuesday, 10 November 2015 4:58 pm
To: "Weka machine learning workbench list." <wekalist@list.waikato.ac.nz>
Subject: [Wekalist] Weka knowlegeflow ignoring SMOTE when run in conjunction with filtered classifier


Hi all, 

I seem to have run into a problem using SMOTE with weka knowlege flow (version 3.7.12).

I have an unbalanced dataset, so I'm channeling my dataset into a filtered classifier. My filtered classifier uses multiple algorithms, but each of them has SMOTE specified as a filter. However, it seems that SMOTE is not generating synthetic data no matter how I set the boosting, and I end up with the same number of rows that I originally fed into the knowledge flow.

I'll be happy to share the full knowledge flow process with anyone who wants to see, but basically, what i'm doing is:

Scheme: FilteredClassifier
Options: -F "weka.filters.supervised.instance.SMOTE -C 0 -K 5 -P 150.0 -S 1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
Relation: procedures_outcomes_ca-weka.filters.unsupervised.attribute.ClassAssigner-Clast-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.InfoGainAttributeEval-Sweka.attributeSelection.Ranker -T -1.7976931348623157E308 -N 1500

On the other hand, SMOTE works fine if I specify it before the filtered classifier (i.e. apply SMOTE first, and then feed the boosted data into the filtered classifier).

-- 
Best Regards,
Suranga
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@list.waikato.ac.nz List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Best Regards,
Suranga