Hi Sani, Uday, George
First a couple of general rules of thumb
1. Never change your original data (but you can still use preprocessing to deal with conflicts, missing values, etc.)
2. Use a noise-insensitive learning algorithm (such as KNN, ANN, softSVM, Random Forests) and don't overtrain.
There are several possibilities in relation to the contradictory cases
1. This is LABEL ERROR or NOISE, so KNN with K>1, and ANN etc. will handle it if the noise in any locale is <50%.
2. This is ATTRIBUTE NOISE, most commonly measurement noise, e.g. measuring to the nearest mm @±0.5mm s.d.
3. This is CORRECT but the attributes underspecify the concept, e.g. sometimes two Iris species measure the same.
The algorithms I've mentioned normally do some kind of thresholding or voting, but you can also use them in a form
that provides a credibility (or can make them estimate a probability). These also lend themselves to resilient boosting.
However, AdaBoost and convex learners can have issues with noise, decision switching boosters are more resilient.
In Weka, LogitBoost does some smoothing, and can estimate probabilities, and is likely to give more resilient results.
Generally, for both type of noise, label noise and attribute noise, more examples are needed to reduce error clumping.
Having exactly two cases with the same attributes and different classes, means the algorithms effectively ignore them.
The real problem is that sometimes you will have K matching errors with similar attributes, so need a bigger K for KNN.
This is in particular the case for case 1, label noise, but for cases 2 and 3 more cases reduces the s.e. of the boundary.
But even though you can find parameter values/means more accurately, cases 2 and 3 have a real variance or margin.
In case 2 the variance ostensibly relates to a normal noise model, whilst case 3 relates to a normal distribution model.
The one dataset can suffer from all three ambiguity cases - e.g. data related to speech (acoustic) or brain (EEG) signal.
In this case an experiment has subjects who can make errors (1), measurement noise, environmental noise, artefact (2),
and is also flying in the dark (3) trying to use signals that don't actually include all the information needed for a decision.
On 02/04/2013, at 8:34 AM, <wekalist-request(a)list.scms.waikato.ac.nz<mailto:email@example.com>>
Date: Mon, 1 Apr 2013 14:15:26 -0400
From: Uday kamath <kamathuday(a)gmail.com<mailto:firstname.lastname@example.org>>
Subject: Re: [Wekalist] How to handle Contradictory instances in
To: "Weka machine learning workbench list."
Content-Type: text/plain; charset="iso-8859-1"
Use a K-Nearest neighbor by removing the class of the contradictory
instances and using them as test data, The result may give you an
indication of which is a true class. generally speaking.
On Mon, Apr 1, 2013 at 1:53 PM, GDombi <GDombi(a)chm.uri.edu<mailto:GDombi@chm.uri.edu>> wrote:
It is very hard to train on data with contradictory cases.
You will have to drop one since they both can't be true.
The question is which one to drop.
You could do some exploratory data analysis and find other similar cases
a keep the case that is most like the majority of similar cases.
For example: if one case is female, age 20-25, with BP 120 and outcome
true then the contradictory case is female, age 20-25, with BP 120 and
Which one do you drop?
If you look at other female cases, ages 20-25, with BP in the range of
110-130, what is the most common outcome? pretend it is false.
Then drop the case above that says true and somehow mark it as an
outlier in your data set.
Bye for now,
On Mon, 2013-04-01 at 04:03 -0700, Sani Zimit wrote:
Please, What is the the most common approach to handling Contradictory
instances in Data set?
After conducting CFS feature selection on the Data sets, I am having
some of the instances with same value for each attribute but differ in
their class category.
Wekalist mailing list
Send posts to: Wekalist(a)list.scms.waikato.ac.nz<mailto:Wekalist@list.scms.waikato.ac.nz>
List info and subscription status:
Wekalist mailing list
Send posts to: Wekalist(a)list.scms.waikato.ac.nz
List info and subscription status:
I am unable to import the ADTree into my java code. I have
"import weka.classifiers.trees.ADTree;" but Eclipse says it is unable to
resolve this. I am able to import J48 and Decision Stumps. Do you know how
to fix this?
Princeton Class of 2013
Please, What is the the most common approach to handling Contradictory instances in Data set?
After conducting CFS feature selection on the Data sets, I am having some of the instances with same value for each attribute but differ in their class category.
I'm a new user of weka, I work on the field of recognition of mathematical
symbols, I have a data base of 43 classes of symbols, in each class I have
20 instances. I want to test the classification algorithms implemented on
weka to choose the best one.
first I started testing classification with a small data base of 6 classes
with 20 instances in each one, I get a promoted results the precision in the
order of 0,9. When I enlarge my data base the precision decreases. My
question is it accepted to have these results:
60% as correctly classified instances
0,6 for the precision
0,6 for the recal
View this message in context: http://weka.8497.n7.nabble.com/classification-results-in-weka-tp27473.html
Sent from the WEKA mailing list archive at Nabble.com.
Research Fellow position is available for 2-3 years at the School of
Computer Engineering (http://www.ntu.edu.sg/SCE) and Nanyang Business
School (http://www.nbs.ntu.edu.sg), Nanyang Technological University,
Singapore, starting in early 2013. You will be conducting research in the
area of machine learning. Specifically, you will be designing effective
classification and transfer learning methods, and applying them to
real-world problems such as detection of frauds in firms. The salary is
Requirements include: (1) PhD degree in CS or a related discipline; (2)
Sufficient experience in machine learning related research, and experience
in transfer learning is an asset; (3) Excellent English writing skills.
If interested, please send a detailed CV to: zhangj AT ntu.edu.sg