Hi,

The cost file did not affect the classification result!

Example from real world. Suppose you have a lot of customers  i.e. ~ 80.000 and you know that  ~ 10% of them Premium-Customers having some information like age etc. as independend attributes to classify. Further i suppose  new information like age atc. from new 8.000 Customers  but didn't know who from this test-sample is a Premium-Customer. And this is from a perspective of direct-marketing interesting and real money.

Suppose a simple cost-matrix:

-1,3

3,-10

That means if i recognize a potential Premium-Customer in my early Business i get a additional profit (negativ,because weka speaks from cost!) for 10$, because i can make special advertisment for this group, what they intend to buy more.

If i didn't identify a Standard Customer correct this makes me cost of 3$ i.e. the money for the advertisments what is loosing etc...

And now i win i.e. in my MarketingAction 0.68$/CUstomer, because i didn't  burn money for the Standard-Customers Marketing Campaign - what me not satisfy is the bad correct classification of 53 Premium-Customers, makes 53*10$ extra profit, but i'm loosing 708*3$ for the marketing campagin to potential Premium-Customers.

And that was NaiveBayes , another Classifier should be better.

Christian

Hope this helps from and let me me anybody now, if i'm not exact.

 

=== Evaluation on test set ===
=== Summary ===

Correctly Classified Instances        7274               90.6192 %
Incorrectly Classified Instances       753                9.3808 %
Kappa statistic                          0.104
Total Cost                           -5492    
Average Cost                            -0.6842
Mean absolute error                      0.1593
Root mean squared error                  0.2803
Relative absolute error                 91.1276 %
Root relative squared error             95.6724 %
Total Number of Instances             8027    

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
  0.994     0.93       0.911     0.994     0.95     StandardKunde
  0.07      0.006      0.541     0.07      0.123    PremiumKunde

=== Confusion Matrix ===

    a    b   <-- classified as
 7221   45 |    a = StandardKunde
  708   53 |    b = PremiumKunde

 

 

 

 

 

 

 

 

 

 

Cupidil <cupidil@yahoo.com> schrieb am 12.09.03 09:14:00:
>
> Thanks.
> Now I can operate it, but it looks like it has no
> affect on the results.
>
> My cost file is:
> % class { 0, 1 }
> 1 0 1 % if 1 and classified as 0, cost is 1
> 0 1 5 % if 0 and classified as 1, cost is 5
>
> I create the model file using:
> java weka.classifiers.SMO -m file.cost -t train.arff
> -d model.mod
>
> And then I test Using:
> java weka.classifiers.SMO -l model.mod -T test.arff
>
> The cost file has no affect. I even tried to switch
> the penalty between 0 & 1 but got the same results.
>
> Can you please help me?
>
> Thanks,
> Cupidil
>
> --- Ko-Kang Kevin Wang <Ko-Kang@xtra.co.nz> wrote:
> > If you copied my examples directly it of course will
> > not work. I should've
> > mentioned it, the example was copied directly from
> > my thesis, which is done
> > in LaTeX.
> >
> > You need to remove the LaTeX mark-ups. I can't
> > remember what form the cost
> > matrix should be in in Weka. But I'm sure you need
> > to at least remove the
> > egin{tabular}.... Stuff. Perhaps make the cost
> > matrix to look something
> > like:
> > 0 1
> > 0 0 1
> > 1 5 0
> >
> > Cheers,
> >
> > Kevin
> >
> > > -----Original Message-----
> > > From: wekalist-bounces@list.scms.waikato.ac.nz
> > > [mailto:wekalist-bounces@list.scms.waikato.ac.nz]
> > On Behalf Of Cupidil
> > > Sent: Thursday, September 11, 2003 4:53 AM
> > > To: Ko-Kang Kevin Wang; 'C. Schulz';
> > wekalist@list.scms.waikato.ac.nz
> > > Subject: RE: [Wekalist] Cost Matrix Problem
> > >
> > >
> > > Thanks!
> > >
> > > I tried it but Weka got stuck.
> > >
> > > I run:
> > > java -classpath weka.jar
> > weka.classifiers.NaiveBayes
> > > -v -o -m ten.cost -t Examples.arff -d model.mod
> > >
> > > The file ten.cost contains the your example:
> > > egin{tabular}{|r|rr|r|}hline &
> > > multicolumn{3}{c|}{ extbf{Predicted}} \hline
> > > extbf{Actual} && extbf{0} & extbf{1}
> > \hline
> > > extbf{0} && 0 & 5 \hline extbf{1} && 1 & 0
> > \hline
> > > end{tabular}
> > >
> > > What have I done wrong?
> > >
> > > Thanks,
> > > Cupidil
> > >
> > > --- Ko-Kang Kevin Wang <Ko-Kang@xtra.co.nz> wrote:
> > > > > -----Original Message-----
> > > > > From: wekalist-bounces@list.scms.waikato.ac.nz
> > > > >
> > [mailto:wekalist-bounces@list.scms.waikato.ac.nz]
> > > > On Behalf Of Cupidil
> > > > > Sent: Sunday, August 31, 2003 8:39 PM
> > > > > To: C. Schulz;
> > wekalist@list.scms.waikato.ac.nz
> > > > > Subject: Re: [Wekalist] Cost Matrix Problem
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for your answer.
> > > > > However I'm not sure what should I write in
> > the
> > > > cost
> > > > > matrix file if I don't want Weka to
> > > > classify/return 1
> > > > > when it should be 0. (I have two classes 0 and
> > 1)
> > > >
> > > > It is impossible (for us) to tell you "the"
> > correct
> > > > answer. The setting of
> > > > cost matrix is subjective and depends on the
> > > > situation. Here your event of
> > > > interest is 0 and you want to minimise false
> > > > negatives. You want to create
> > > > a cost matrix that penalise misclassifications.
> > So
> > > > for example you may
> > > > have:
> > > > egin{tabular}{|r|rr|r|}hline
> > > > & multicolumn{3}{c|}{ extbf{Predicted}}
> > > > \hline
> > > > extbf{Actual} && extbf{0} &
> > extbf{1}
> > > > \hline
> > > > extbf{0} && 0 & 5 \hline
> > > > extbf{1} && 1 & 0 \hline
> > > > end{tabular}
> > > >
> > > > In the above cost matrix (typset in LaTeX form),
> > the
> > > > "cost" of
> > > > misclassifying a 0 as 1 is FIVE times as bad as
> > > misclassifying a 1 as
> > > > 0. You can modify this 5:1 ratio to reduce the
> > > > misclassifying rate.
> > > >
> > > > Do remember that you should not try to achieve a
> > 0%
> > > misclassification
> > > > rate, as that will grossly overfit the model....
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Kevin
> > > >
> > > >
> > > >
> > > >
> > >
> >
> --------------------------------------------------------------
> > > ---------
> > > > Ko-Kang Kevin Wang
> > > > Master of Science Student
> > > > Statistics Lab and SLC Tutor
> > > > The University of Auckland
> > > > New Zealand
> > > > Homepage:
> > http://www.stat.auckland.ac.nz/~kwan022
> > > >
> > > >
> > >
> > >
> > > __________________________________
> > > Do you Yahoo!?
> > > Yahoo! SiteBuilder - Free, easy-to-use web site
> > design
> > > software http://sitebuilder.yahoo.com
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Wekalist@list.scms.waikato.ac.nz
> > >
> >
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> > >
> >
> >
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@list.scms.waikato.ac.nz
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist