(To anyone interested in a simple math problem wrt classification?)
How much more/less training does 10-fold cross-validation on training set
(10CV) entail compared to training on full training set?
(a) 0.9 times, since 10CV build 10 classifiers each trained on 90% of
training set and tested on the remaining 10%
(b) 9 times more, after all 10 classifiers trained on 90% of training set
equals 9 times
(c) 1 to 9 times, since even though arguments in (c) and (a) are basically
true, training instances in different folds are mostly the same (i.e. one
training instance is involved in 9 out of 10 classifiers) and so it cannot
be said that replication of a training instance N times in CV folds
contributes each time as if it were a new instance (If you choose (d)
specify more accurately.)
To me (b) and (c) (more than 1 times) make more sense since as known CV
tends to produce a better classification accuracy than actual full train -
separate test set setting.
What do you think? Thanks!
yst. terv | Best wishes
hi, i´m working with parallel processing of algorithms and running in a grid
i need to parallel a algorithm of weka and run in grid for obting result.
anyone, can help me?
anyone can send me a parallel version of anything algorithm of weka ???
my english is not very well, because i am a brazilian guy. :D
I'm working with TextDirectoryToArff and I'm having a problem because the
arff file contains String attributes. And I need only numeric attributes. Is
there any way to convert an arff file with String attributes in another with
Essentially I have 7 columns in a database that im trying to use to predict
I wrote a ruby script that outputs the data to .arff file -- this is luckily
a simple format :)
I got everything installed and the program will run.
It is my understanding that you can't have any numeric data, so I used some
discretize filters to do all that.
My 'class' attribute is the last one listed in the file header, it is picked
When discretizing, it was first-last yet the last didn't get discretized for
whatever reason -- does the class/'actual' data have to be discrete? If so,
I can get the filter to actually work on it by choosing something else as a
temporary class field.
Now that I can get the file loaded and everything discretized I'm unsure on
how to generate a good algorithm - there are many choices each suitable for
a different situation but I am no statistics expert and I don't have the
time (wish I did) to become one.
Ultimately this would ideally be used in some code to be run on the fly --
will weka output an equation I can use - or do I need to keep some sort of
service running through which I feed the "question" data that needs a
I took AP stats in high school, but being many years out now - could use
Should I be using a different software package? Is there any FOSS software
that would be more suitable?
I searched through 490 'older' threads here on the board but didn't find
anything -- a search for prediction/predictive doesn't really help me.
View this message in context: http://www.nabble.com/predictive-algorithms-tf3811741.html#a10789304
Sent from the WEKA mailing list archive at Nabble.com.
we are searching for a dataset that represent a retail customer transactions database. The data set should be suitable for Association rule mining, Clustering and classification data mining techniques. If any one have an Idea about such data sets please tell us.
Thank you in advance
____________________________________________________________________________________Pinpoint customers who are looking for what you sell.
> java weka.clusterers.MakeDensityBasedClusterer -t data\sample1.arff -x 5
> weka.clusterers.XMeans. This should work.
> I can't say what's wrong without seeing the Exception/Error message.
> I've just tried it (UCI dataset "bolts") and it works.
> NB: XMeans only works with numeric attributes!
Here is the Exception/Error message:
C:\Program Files\Weka-3-5>java weka.clusterers.MakeDensityBasedClusterer -t
\sample1.arff -x 5 -W weka.clusterers.XMeans
java.lang.Exception: ClusterEvaluation: File not found.
at weka.clusterers.Clusterer.runClusterer(Unknown Source)
at weka.clusterers.MakeDensityBasedClusterer.main(Unknown Source)
I am fresh in using Weka.
Here is one simple question, which is: what does the test options in
classify used for? such as "supplied test set",
PS, I am using the graphic interface of Weka.
Looking forward to your reply.
I am a Ph.D. student, who is playing with Weka recently for clustering.
There is a strange thing. Each time, I do classify with and without
filter, the correct prediction rate for unfiltered data is always higher
than filtered data, which should be in the opposite way in mine mind.
Could you please explain me why this happened?
Below is the steps what I did for classify in Weka:
1-1. preprocess->open file ## load file in
2-1. classify->use training set
3-1. classify-> choose->SMO
1-2. preprocess->open file ## load the same file in
2-2. selectAttributes->attributeEvaluator and Method, choose use ful
3-2. selectAttributes-> start
Select the AttributeSelection and select the evaluator and search methods
5-2. classify->use training set
6-2. classify-> choose->SMO
The correct prediction rate of 4-1 is always higher than 7-2.
But usually, after filteration, the correct prediction rate should raise,
right? I am really confused about this.
Looking forward to your reply!