Hello to all
I have a question regarding the most appropriate accuracy metric available
in weka for regression problems? My paper got rejected recently, and one of
the reasons was inappropriate metrics used. I used correlation coefficient
CC, Mae, rmse, rrse but still they raised objections. I don't know if there
are other options available in weka? Kindly if someone can guide about the
better alternative, if there?
Date: Sun, 22 Sep 2019 13:55:00 -0700 (MST)
From: mcbenly <bandehali(a)gmail.com<mailto:firstname.lastname@example.org>>
Subject: [Wekalist] Kappa metric for multi-class classification?
I am having difficulty choosing best performance measure for my multi-class
There are four classes in my dataset, and data is Imbalanced.
Personally I preferred using weighted f-measure and AUROC for binary
classification. But I guess I can't use AUROC for multi-class
classification. Not sure weighted f-measure alone would be good for
I read in few research papers, that for multi-class problem, use F-measure
micro-macro averaging. Use micro if data is imbalanced.
But as far as I understand micro f-measure averaging is same as
I was wondering if I could use "classification accuracy + Kappa Statistics"
as my *main performance measure*? Will this be right combination?
OR any other suggestion you might have?
In those circumstances F1 is not a good choice, and chance-corrected kappa measures are more appropriate, and can be directly applied to multiclass data. You can also macroaverage weighting by the bias to a particular prediction (proportion of time that class label is predicted) - it is not appropriate to weight by the prevalence (proportion of the time the real class occurs). Accuracy is also easily biased and is misleading to the extent that bias doesn’t match prevalence. To the extent you have a per class or per instance cost you can use that, but otherwise a chance correct measure is best.
The Cohen Kappa included in Weka is a reasonable but not a good choice (a chance-corrected version of Accuracy), as like F1 it is not good if prediction bias fails to match prevalence for each class. I include a link to a paper on this below.
What is appropriate is the multiclass form of Kappa called Informedness which is chance correct in the sense that it gives the probability of an informed decision (viz. not chance). Again I include links.
The binary form of this is Peirce(1884)’s I and Youden(1950)’s J and Flach(2003)’s deskewed WRAcc and what is known in Psyc as DeltaP'. It corresponds to the distance above the chance line in the ROC curve, viz. tpr-fpr, which is what is maximized when choosing the standard operating point in ROC. It macroaverages over predictions as described above to estimate the multiclass form of Informedness (and the short ECAI and long JMLT papers show how the Bookmaker estimate recovers the underlying probability with which a Monte Carlo simulation makes and informed decision or guesses).
This is a hobbyhorse of mine… I originally modelled informedness in terms of gambling on your predictions (hence the multiclass measure is also known as Bookmaker, Bookmaker Informedness or Bookmaker Probability, and that makes it clear why you should weight classes by their bias - the appropriate weight across horses is how much you bet in on each horse. I have written extensively on this, and including providing Matlab scripts, an eXcel calculator and a version of Weka that provides it as an alternate evaluation measure (in Explorer and Experimenter as well as Adaboost, which turns it into Adabook). I include a selection below (but e.g. exclude ones about visualizations, including the relation to ROC and AUC - there’s also a paper about why you should never use F-score, and one that focuses on mutliclass visualizations - both available on arXiv).
My Weka fork with Informedness: https://www.dropbox.com/s/artzz1l3vozb6c4/weka.jar?dl=0
2013 ICINCO Paper+Poster - Adabook & Multibook
2012 EACL Paper+Poster - The Problem with Kappa
2011 JMLT - Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation
2008 ECAI Paper+Poster+Talk - Evaluation Evaluation
2003 ICCS Paper+Poster - Recall and Precision vs the Bookmaker (38)
1998 CoNLL Paper - The Present use of Statistics in evaluation of NLP parsers
You also mentioned liking AUROC. It is important to understand what this actually measures!
ROC AUC gives the probability that a positive prediction is ranked above a negative prediction, and represents a balance between finding a specific operating point (Certainty = (Informedness+1)/2 is then the area under a three point curve) and how much room there is for distributional variance (Consistency = AUC-Certainty - area between the multipoint curve or convex hull and the three point curve - as discussed in my ROC ConCert paper - I’ve added a link to this).
2012 ROC ConCert
Prof. David M W Powers, Ph.D. http://f<http://flinders.edu.au/people/David.Powers>linders.edu.au/people/David.Powers<http://flinders.edu.au/people/David.Powers>
Professor of Computer Science & Cognitive Science, TON2.10
South Australia Research Director, ARC ITRH Digital Enhanced Living Hub
College of Science and Engineering (Phone: 08-8201 3663)
Flinders University, Tonsley, South Australia 5042 (Fax: +61-8-8201 3626)
GPO Box 2100 Adelaide SA 5001 (Mobile/Viber: 0414-824-307)
does anyone of you have experience with importing FHIR  data into
Weka? Or can you point me to a useful ressource/project dealing with
FHIR data in Weka?
Thanks a lot,
Nightly snapshots of WEKA are now available as .deb packages for Debian-based Linux distributions (e.g., Ubuntu):
They can be downloaded just like the other WEKA snapshot files (e.g., using a web browser) and installed with an appropriate GUI-based software manager (e.g., the one for Ubuntu) or corresponding command-line tools. The packages also come with two command-line scripts that make it easy to start up the WEKA GUI and run WEKA from a command-line interface.
This new, convenient way to install WEKA on Debian-based Linux distributions is thanks to the efforts of Peter Reutemann, who has also written a blog article on how to use the packages:
PS: This also makes it particularly easy to put together a Docker container that has Weka.
For example, if we have data generated from svr as, 0.1, 0.34, 0.21, 0.53
etc and random forest as, 0.42, 0.01, 0.22, 0.87 etc. Can we perform t test
on this data because I read that to perform t test, our data should be
My second question, for this type of data in which we look for the
significance difference between these two algorithms, which type of baysian
test is available.? In simple words, what is the Wilcox test or t test
alternatives of baysian tests? Various baysian tests are available such
baysian independent, baysian paired test etc
I have a question about the StringToVector filter. I am using this filter
in a FilteredClassifier context.
I need the list of the words that have been actually used in the
For this reason, I set the following parameters:
Since I set a MinTermFreq to 5, I expected to find words with a frequency =
or > 5 in the dictionary file (that I called eCare_WordList_5-10000).
Instead this file contains words with a frequency lower than 5, eg.
How is that?
Thanks in advance for your answer.
I'm working right now with Naive Bayes because I know that this algorithm
support giving weight to attributes but I would like to know if there are
other algorithms that support it too.
Exactly, *can I give weights to attributes using Random Tree?*
Thanks in advance!
Sent from: https://weka.8497.n7.nabble.com/