Now I have a problem about "-p" option, a little confused about
the meaning of the probability we obtain by using the option. For
example, we use:
java weka.classifiers.trees.j48.J48 -t labor.arff -T labor.arff -p 0
0 bad 0.9676977463896519 good
1 good 0.9676977463896519 good
2 good 0.74827996031097 good
How should we explain the probability like "0.9676977463896519"?
Does it mean the probability of obtaining the class predicted---"bad"
or the probability that the first instance was predicted to have a
Thanks very much if there is any response!
Im trying to use the Multilayer Perceptron NN for classification of a data set
that consists of between 270-4000 attributes and 400 instances. I have to use
30-fold cross-validation. When i run it with just 270 attributes and 30
instances, it takes me 4 hours to get results. And when i increased the number
of instances to just 45, it ran for 13 hours with no results, and i had to stop
it. Im allocating 512 mb of memory. The rest of the settings are default (500
epochs etc.) My question is, how can i make this run faster without chaning the
number of folds and cross validation, whilc still getting reasonable results.
Do i have to run a machine with more memory? I tried reducing the number of
epochs, which did improve the time a little, but it also hurt the accuracy. Any
help would be greatly appreciated.
In weka, How can give more weight to precision of class in classification? It
need not recall all the instances!
Center for High Performance computing TeL: (+49) 351 4633 1945
Technical University Dresden
Zellescher Weg 12
I found the solution, it really seems to be a bug in Version 3-4-3.
I correct my first post, I use the newest version 3-4-3.
Weka plots these incorrect dots only if I use 16 Bit color depth.
I have changed my color depth to 32 Bit and the Zoom In Visualization
function works fine.
I use the BayesNet under weka for classification, and I have a variables to predict (Reusability) and other variables, when I applie the algorithme, the graph that I obtain indicate that my variable to predit (Reusability) It`s a parent of other variables, but in my understand, this must be the child (effect) not the cause of others...
What I try to do is to predict reusability using bayesian network, I have a set of data (84) and 7 variables (plus the variable to predict), In attachement the graph obtained after application of the RepeatedHillClimber algorithm.
Can some one help me and tell me if It`s normal that my variable (Reusability) be a parent of other variable??
Thanks a lot,
Note: I can`t do the graph in attachement because It`s more 40K
Créez gratuitement votre Yahoo! Mail avec 100 Mo de stockage !
Créez votre Yahoo! Mail
Le nouveau Yahoo! Messenger est arrivé ! Découvrez toutes les nouveautés pour dialoguer instantanément avec vos amis.Téléchargez GRATUITEMENT ici !
I opened the Iris example within Weka Explorer and clicked on to the
Visualize tab. Until there the visualization is correct but if I click
on a field to enlarge the visualization, weka plots some dots into the
grey area around the black field and shows a really different dot
pattern. Some dots are before point 0,0 and some are on the axes and so on.
Could this be a bug? I use version 3.4 with Windows XP and Java 1.4.
I'm trying to use the Weka KnowledgeFlow interface for building classification
models and applying them for scoring. I already understood how to create a
"data flow", including splitting a data set into training and test data.
However, I still could not find how to connect an unclassified data set for
scoring to the flow. I'm used to the SAS Enterprise Miner, where they
distinguish not only between test and training data, but also have "score
data" connectors. Maybe Weka follows a different paradigm?
Lots of thanks in advance for any hint how to do scoring in Weka.
sun lina wrote:
> hello, recently I have used -p option in order to obtain the
> preditions on some test set using the J48 and NaiveBayes classifiers.
> (for example, "java weka.classifiers.bayes.NaiveBayes -t
> .\data\car.arff -T .\data\car.arff -p 0")
I notice that you use the same file for training and testing.
You should use a different file for testing, since the resubstitution error
is not a good predictor for the performance (or accuracy) of a
> To some extent most of the probabilities will be close to 1 and it
> seems that most algorithms in terms of being overly 'sure' about it's
> I also did some tests on other data sets like iris.arff and sick.arff
> with the same results.
> Have anyone met similar questions like this? And if so, could you give
> any example of how to interpret the result of -p option or or are
> there technical problems specific to weka with this option?
> You will be appreciated for any response!
> Wekalist mailing list