I am Balaji from Anna University,Chennai,Tamil Nadu,India.I am doing M.E(Software Engineering) in Anna University.Now iam doing Final Year project in Cost sensitive classification.I am going to make ID3 algorithm as cost sensitive classifier.In my approach i am going to consider some more costs other than misclassification cost.
I used weka's cost sensitive classifier for my project.I encounted following problems,while i am using cost sensitive classifier.could you please provide me the details for the following.
1.When using the cost sensitive classifier for the classification using ID3 as the base classifier it shows the same result as the Individual ID3 generated.what is the reason for that.
2.In cost matrix it is asking the consatant cost ,but it is not reflecting in the base classifier any way.How to assign the cost in the cost matrix.How that magical numbers will come in the cost matrix.Any method behind it.
3.How actually the cost that is given through cost matrix will gets affected in the Base ID3 classifier.
Thanking you sir,
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
I am attempting to use CFS on a dataset of
approximately 250 instances and 10,000 features. Using
the default settings of the BestFirst search, the
algorithm requires more than 1GB memory.
Is there a way to estimate how much memory will be
There is an option to limit the amount of memory used
in the BestFirst code.
"-S Size of lookup cache for evaluated subsets.
Expressed as a multiple of the number of attributes in
the data set. (default = 1)"
It is not clear to me what this means. Can someone
explain? If I have 10,000 attributes and -S=1, what is
the size of the cache? Are values of -S < 1 valid?
Find local movie times and trailers on Yahoo! Movies.
I build a model with J48- Algorithm. Now I want to assign each instance with a corresponding terminal node. After that I want to store the new dataset in a file. (.arff ; .cvs or .txt)
For Example should the result dataset look like this:
1, 21.2, m,n,N1
I tried to understand what happened when I use the "Visualize Classification Error" (in the GUI and Source code) and then save the result to an .arff file. It nearly does what I want to do, but I don't know how I can add a node instead of a predicted class.
I ´m not sure if this first sign is the right way to solve the described problem.
Are there any other first signs ideas or any solutions?
Can anyone give me some tips to carry on my first sign?
It's important and somewhat urgent for me, so each idea is welcome.
Thanks in advance.
Thank you for your reply. But, I think I am using a pretty new version weka
3.4. Also, because I want to run my code to do my experiment, I wonder if
there is any possible way to fix that problem?
I'm a new weka user. In the preprocess, there are no good options to
complete missing data. The only option present is that where modal or mean
is used. There are another ways to complete missing values in weka?
>> I compared the results of SMO with LIBSVM
>> (http://www.csie.ntu.edu.tw/~cjlin/libsvm/), which is a simplication
>> of both
>>SMO(Platt) and SVMLIGHT(Joachims) using the RBF kernel with same c, G,
>> tolarence parameter, and 10-fold cross validation. With LIBSVM I got
>> the 81.44%
>> accuracy and with SMO I got 67.14%. Why are they so different?
>There can be multiple reasons for this (note that I don't know anything
>a) SMO normalizes the attributes (but you can turn that off)
I normalied the attributes for LIBSVM as well!
>b) The dataset is small and if you repeat the cross-validation with a
>different random number seed you get a very different result (try
>changing the seed). It is very unlikely that both LIBSVM and Weka
>happen to shuffle the data so that the cross-validation folds are
The dataset contains 4000 instances. I agree that the cross-validation fold
may not be the same, but on average they should produce more or less same
results but not of 15% difference. A dumb question: What is seed and how it
effects the data selection for training?
>c) There is a bug somewhere in Weka's SMO (seems unlikely).
>> as the value of C increases, the algorithm should try to classify the
>> more accurately.
> This is incorrect (if you are referring to cross-validation). As you
> increase C (and allow the algorithm to fit the training data more
> closely) the accuracy on the TRAINING DATA normally goes up. However,
> this might lead to overfitting, which would mean the cross-validated
> error would go up.
No, I am talking about the C parameter in SVM algorithem. If I understood
correctly, the C-parameter tells you how tolerant to outliers the SVM is. The
bigger the C the 'harder' the margin becomes. That means for C->infinity the
algorithm tries to classify every training point correctly.
Center for High Performance computing TeL: (+49) 351 4633 1945
Technical University Dresden
Zellescher Weg 12
On Oct 16, 2004, at 12:20 PM, wekalist-request(a)list.scms.waikato.ac.nz
You are using a pretty old version of Weka, and I don't have the code
for that version lying around at the moment, but my guess is that
Evaluation used to randomize the data outside the crossValidateModel()
routine. So the data probably isn't being shuffled in your piece of
> From: Ling Zhuang <lzhu(a)deakin.edu.au>
> Date: October 15, 2004 6:14:53 PM GMT+13:00
> To: wekalist(a)list.scms.waikato.ac.nz
> Subject: [Wekalist] Problem when running j48
> I am running j48 algorithm on some text data set. Accidently I tried
> two ways to run the program, one is just use
> java weka.classifiers.trees.j48.J48 -t *.arff. The other is that I
> wrote some code in another java program experiment.java:
> Classifier j48 = new weka.classifiers.trees.j48.J48 ();
> String trainName = Utils.getOption ('t',args);
> Instances trainInstances = new Instances(new FileReader(new
> int classIndex = -1;
> if (classIndex != -1)
> trainInstances.setClassIndex(classIndex - 1);
> trainInstances.setClassIndex(trainInstances.numAttributes() - 1);
> Evaluation evaluation_j48 = new Evaluation(trainInstances);
> int numFold = 10;
> String fold = Utils.getOption ('x',args);
> numFold = Integer.parseInt (fold);
> evaluation_j48.crossValidateModel (j48,trainInstances,numFold);
> System.out.println(evaluation_j48.toSummaryString ());
> And then I run java experiment -t *.arff. I found they gave different
> classification accuracy. Why does this happen??
> Thank you in advance!
On Oct 16, 2004, at 12:20 PM, wekalist-request(a)list.scms.waikato.ac.nz
> Offcourse both SVM will show the different results. The optimisation
> procedure is different. Due to this you got different results.
No, that's not right. The optimization procedures may be different but
they should arrive at the same solution (there is only one correct
solution in the case of SVMs).
I think I didn't put my question in a proper way. I have 3900 instance and two
classes. Class A has 950 instance and rest belong to B class. I used C4.5,
Navie Bayes, KNN and SVM for classification. Because of class B domination,
most of the time these methods are classify all instances as class B (i.e.
roughly 74.5% successrate). So, in the case of SVM, I increased the weight for
class A to 4 times more than class B. Then I got the successrate of 85%!! So, I
want to do the same in case of C4.5, Navie Bayes, and KNN but could not figure
it out in Weka how to do it. So, if any one knows it, please let me know? I
want to set the ratio of weights depending on training instances ratio.
Zitat von Euan Adie <euan.adie(a)ed.ac.uk>:
> Hi Samantha,
> It depends which classification algorithm you are using, but the
> simplest way to do this with classifiers that output a score would be to
> change the threshold. If the classification algorithm you want to use
> doesn't have any sort of confidence measure then the best thing for you
> to do would be to search Google as I think it's a pretty open
> question... you might be able to bias the training set in one direction
> or another, for example, but this has drawbacks.
> With the threshold method - you can do it manually (by looking at the
> scores output for each item in a test set) or automatically, using the
> ThresholdSelector class.
> i.e. in Weka explorer, load in your training data then go to the
> Classify tab. Select Meta -> ThresholdSelector in the "Classifier"
> panel. Click on the text box next to the "Choose" button to select which
> classifier you want to use and which class you're most interested in
> then go ahead and click Start. See the ThresholdSelector help text for
> more detilas...
> Hope that helps,
> kottha(a)zhr.tu-dresden.de wrote:
> > Hi!
> > In weka, How can give more weight to precision of class in classification?
> > need not recall all the instances!
> > Thank you.
> > regards,
> > Samatha
I am first time using the Relief, and Info gain attributes evaluations for
attribute selections. I have 250 attributes and these methods are ordering them
according to their importance. So, now the question is what is the cut of
value? I mean, ones they are ordered using these methods, how many of them
(percentage ?) I should considered for the training? Is there any creteria or
explanation for this?