Ok I am brand new to use WEKA. I have a .csv file that I changed by
into a .arff file using WEKA. I am trying to run the discertize filter
upon the data. I keep getting the following error,
Problem filtering instances:
Class (colour) needs to be set for supervised filter.
I am unsure what this means and any help would be greatly appreciated. Thanks.
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
My questions in the last part of the mail...
> I am very new to weka. I am using the windows interface version. I am trying to build a forest fire risk, first based on the observed data. The class of my set of data is called "fire occurrence", and the instances i have in this column are of 4 types:
> - no fire occurrence
> - fires occurrence: low, medium, or high
> I want to make a sample that contains the same number of cases with fire occurrence as with no fire occurrence. For that i want to tell weka to make 2 random subsamples
> - one based on the no fires occurrence cases
> - one based on the fires occurrence cases
> and once these 2 subsamples are done, join them together to get the sample I want.
> Could you please tell me what are the commands and in which order may i use them??
That involves a lot of manual steps... You'd have to separate the fire
and non-fire instances with the RemoveWithValues filter (package
weka.filters.unsupervised.instance) into two files. Then you'd resample
one of the files to obtain the same number of instances as in the other
one (weka.filters.supervised.instance.Resample). And finally, you'd
merge those two files again (in the SimpleCLI):
java weka.core.Instances merge file1 file2 > mergedfile
Note: "weka.core.Instances merge ..." is only available in 3.5.6 and later.
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ +64 (7) 838-4466 Ext. 5174
Thank you so much peter!
I did these steps manually and once i have built the randomly subset of data, i tried to make a decision tree using the J48 algoritm. I did this several times for different subsets of data obtaining different trees or models, one for each subset, with different errors. Next thing i would like to do is a bagging and a boosting of these different models but donno how to tell weka to do this. The format in which Weka saves the models makes it not possible to load them when starting again the explorer....
I already tried to do the bagging of my data trying to mantain in the samples the randomization and a weighted presence of the different classes by:
1) loading the complete set of data (40 000rows)
->bagging (in size of the bag i tried to put 100/40 000 but the program doesnt allow me- meaning i wanted to have sample sizes of 100 cases)
-> filter: spreadsubsample (so i can get all classes represented in each subsample as i wanted)
i tried this on the complete training set (40 000 cases) and results were very bad
so, i tried the bagging again using a sample test of one of the small trees i obtained manually in the beginning, but again, results were very bad...
I would be very grateful if you could tell me how to do the bagging...
Sé un Mejor Amante del Cine
¿Quieres saber cómo? ¡Deja que otras personas te ayuden!
Hi, i nedd to know how can i insert a new algorithm with a gui (graphical
user interface) in weka project. I don't know where is the main class and
how can push a button, a label, a new panel,... is swing?
Thanks for all
Hi. It seems the bigger the cross, the higher the true (not predicted) class
best, Harri S
Lainaus Xue Li <me.lixue(a)gmail.com>:
> Dear all,
> After building SVMreg model on the training data, I right click on the
> model, and select "Visualize classifier errors".
> Would someone tell me what does the size of the crosses on each data
> mean? Is it kind of error thing? The bigger the cross the larger some
> of error?
> Thanks a lot!
> Xue, Li
> Bioinformatics and Computational Biology program @ ISU
> Ames, IA 50010
yst. terv | Best wishes
> I am very new to WEKA and have the following problem.
> I want to create a JAVA program that perform a (WEKA) SimpleKMeans
> clustering from a CSV file.
> I had no problems to read the CSV file and was able to create the
> number of clusters
> Using the following code snippet
> CSVLoader loader = new CSVLoader();
> loader.setSource(new File(csvfilefilename));
> Instances data = loader.getDataSet();
> SimpleKMeans cl = new SimpleKMeans();
>>From hereon I do not know how to proceed to save the clusters to file.
Do you want to save the output of the clusterer or the the clusterer itself?
In the first case, check out the Wiki article "Use Weka in your Java code"
and here the following sections:
- Clustering instances
URL for Wiki article:
> Can someone help , best by providing a code snippet or a reference to an
> example java file.
Code snippets are listed in article mentioned above.
> Another question:
> The first two attributes (of every data row) in my CSVFile are identifiers
> i.e. those two attributes should not be considered for clustering. How do
> tell the clusterer to ignore those attributes ?
Use the Remove filter in conjunction with the FilteredClusterer (available
since version 3.5.4) and choose those two attributes for removal, then the
base clusterer never encounters those.
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
I have downloaded and successfully run and install weka 3.5.6, e.g. smo.
Now I want to use the libsvm function in weka, but somehow I got "libsvm not
in classpath" error.
I have read the post
follow the solutions in the post but
still could not solve the problem.
Below is my RunWeka.ini file, do I make anything wrong or other ways I can
# Contains the commands for running Weka either with a command prompt
# ("cmd_console") or without the command prompt ("cmd_default").
# One can also define custom commands, which can be used with the Weka
# launcher "RunWeka.class". E.g., to run the launcher with a setup called
# "custom1", you only need to specify a key "cmd_custom1" which contains the
# command specification.
# Author FracPete (fracpete at waikato dot ac dot nz)
# Version $Revision: 1.1 $
# setups (prefixed with "cmd_")
cmd_default=javaw -Xmx#maxheap# -classpath "%CLASSPATH%;#wekajar#"
cmd_console=cmd.exe /K start cmd.exe /K "java -Xmx#maxheap# -classpath
cmd_explorer=javaw -Xmx#maxheap# -classpath "%CLASSPATH%;#wekajar#"
# placeholders ("#bla#" in command gets replaced with content of key "bla")
# Note: "#wekajar#" gets replaced by the launcher class, since that jar can
# be provided as parameter
After building SVMreg model on the training data, I right click on the
model, and select "Visualize classifier errors".
Would someone tell me what does the size of the crosses on each data point
mean? Is it kind of error thing? The bigger the cross the larger some kind
Thanks a lot!
Bioinformatics and Computational Biology program @ ISU
Ames, IA 50010
I have a agricultural dataset consists of both numbers and catagorical
data.I want to visualize them together.How to proceed?
Center for Data-Engineering
I have released this classifier, implemented in Java, as open source
under the GNU license.
It loads data from the same ARFF format as WEKA, and you might be interested in
checking its performance on the data sets you are already familiar with.
Best regards, Giorgio Corani
JNCC2 is the Java implementation of the Naive Credal Classifier 2. NCC2
constitutes an extension of Naive Bayes towards imprecise
probabilities; it is designed to return robust classification, even on
small and/or incomplete data sets. A peculiar feature of NCC2 is that it
returns set-valued (or imprecise) classifications (i.e., more than one
class) when faced with doubtful instances. Imprecise classifications are
valuable as they clearly highlight doubtful instances, preventing
over-confident use of the issued judgments; however, they still convey
an informative content, dropping unlikely classes. This could be
appealing in contexts in which the classification outcome is especially
sensitive, as for instance in medical area.
Extensive empirical investigation shows that NCC2 returns imprecise
judgments on instances whose classification is truly doubtful; in fact,
Naive Bayes achieves a much higher classification accuracy on the
instances precisely classified by NCC2, than on those imprecisely
classified by NCC2. We say that NCC2 isolates area of ignorance, i.e.
subsets of instances over which the accuracy of Naive Bayes sharply drops.
JNCC2 is open source; it is released under the terms of the GNU GPL
license; it is hence freely available together with manual, sources and
For more information, visit http://www.idsia.ch/~giorgio/jncc2.html.
I am very new to WEKA and have the following problem.
I want to create a JAVA program that perform a (WEKA) SimpleKMeans
clustering from a CSV file.
I had no problems to read the CSV file and was able to create the requested
number of clusters
Using the following code snippet
CSVLoader loader = new CSVLoader();
Instances data = loader.getDataSet();
SimpleKMeans cl = new SimpleKMeans();
>From hereon I do not know how to proceed to save the clusters to file.
Can someone help , best by providing a code snippet or a reference to an
example java file.
The first two attributes (of every data row) in my CSVFile are identifiers
i.e. those two attributes should not be considered for clustering. How do I
tell the clusterer to ignore those attributes ?
Any help is highly appreciated.