I'm curious about the formula used by Weka to calculate the root mean
squared error. I know this has been asked on the maillist before, but,
pardon my ignorance, I'm still confused. I would have guessed the formula
was sqrt(sum((xi-yi)^2) / n) where x is a vector of calculated values and y
is a vector of actual values and n is the number of observations. But when I
use that formula to manually calculate RMSE, I get a different answer than
what Weka generates. Below, Weka shows a value of .1527, but I calculate
.16667. For the below example, I define an output class as a nominal type
with possible values of 0 and 1.
Correctly Classified Instances 175 97.2222 %
Incorrectly Classified Instances 5 2.7778 %
Kappa statistic 0.943
Mean absolute error 0.0466
Root mean squared error 0.1527
Relative absolute error 9.627 %
Root relative squared error 31.0329 %
Total Number of Instances 180
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.986 0.038 0.948 0.986 0.967 0
0.962 0.014 0.99 0.962 0.976 1
=== Confusion Matrix ===
a b <-- classified as
73 1 | a = 0
4 102 | b = 1
I've been told that "The sum is taken over all class values, as well as over
all instances," so my formula is probably wrong in that it's not adding in
some extra squared errors, but I don't really know what that means. The Weka
Explorer User Guide page 7 says "By default, the class is taken to be the
last attribute in the data." So to me that implies that by default there
will be one class.
Bottom line - If someone could show me the formula or, more importantly,
show me explicitly how the formula is used to generate the above RMSE value
of .1572, I would be forever grateful.
>Weka is not the typical training program for image classification.
>There are several problems which are not so easy to solve with Weka:
Yes, but there are Java libraries for preprocessing images such as
ImageJ, which are as fast as dedicated C routines. I recently built
an industrial image classification system with these, plus JNI to
get camera images from a linux-uvc camera - because people won't
write kernel drivers in Java - yet. ;-)
I don't think that you will get good results just classifying the
pixel data. Even in handwritten digit recognition, you need to
scale, brightness-equalize, downsample and deslant your digits
before running them through SVM. For this, a lot of basic image
classification code is needed (first/second moments, downsampling
and equalization routines, ...) and you may prefer testing
keypoint vectors (e.g. by SIFT) or output from edge detectors (such
as the Canny edge detector) as features.
Also, I'd rather people _not_ contribute to the myth that Java is
slower than C - if properly programmed and optimized, Java is quite
in the same league, minor differences nonwithstanding. There are
optimized SVM versions (such as svm_perf) which are not yet
available in Java, but these could easily be ported and would be
equally fast. It usually happens that I have more problems running C
learning algorithms than Java learning algorithms on large datasets,
because most of them still have lots of 32bit pointers and need
patching to utilize more than 4GB of memory.
BTW I've a recent paper on using WEKA in image mining for static
analysis of mobile phone camera images to reconstruct go board
positions: see http://alex.seewald.at, Publications, for details.
Dr. Alexander K. Seewald +43(664)1106886
Information wants to be free;
Information also wants to be expensive (S.Brant)
--------------- alex.seewald.at ----------------
Hi I am trying to modify the cobweb algorithm in clustering. I am extending
it to accept uncertain data in which the nominal attribute type will be a
matrix. How can I possibly do this and extend the nominal attribute type??
IN, United States
since attribute selection may be quite long on some training sets
is there a procedure to store a model on the training set and apply the
same attribute selection
on test set whithout having to process the training set again
I needed to see the detailed accuracy and confusion matrix for
updateable classifiers in KnowledgeFlow. If anyone else needs this, the
following should save you some time. Node this is for version 3.5.7 so
the line numbers may differ in other versions.
I modified the acceptClassifier method in:
Change line 117 to:
Add the following after line 211:
results += "\n" +
I have a weighted dataset and hope to use J48 classifier to do
classification with it. In that dataset the weighted instances are assigned
to all classes, each weight for each class. I tried to find the doc for
instance weights for J48 but I failed. Only on page found here:
After reading the words for weights(Instance), I am not sure about it yet.
Is there any more details or short code sample about it? And, what format
should the ARFF files with these instance weights (weights, not weight) be?
reading the Wekadoc I saw that models are always trained on the whole training set. My question is: if I train a model with the "Cross-validation" option with 10 folds, is it trained on the whole training set or on 10 "sub"-sets ?
The same question is about the other test options.
Thanking you in advance