Yes, what you have is a multi-class problem with three classes (not a
multi-label problem) and most classification algorithms in WEKA are
designed to be able to deal with multi-class problems.
I don’t have any really good suggestions. Getting the neutral class right
is difficult, I suppose even humans may disagree in some cases.
You could treat the class as ordinal, and use the OrdinalClassClassifier in
conjunction with a base learner that produces good probability estimates.
Perhaps try calibrating the probability estimates when you do this by
wrapping your base learner also into Stacking with
ClassificationViaRegression+IsotonicRegression as the meta learner.
---------- Forwarded message ----------
> From: alex1491 <alex149(a)live.co.uk>
> To: wekalist(a)list.waikato.ac.nz
> Date: Fri, 12 Jan 2018 08:58:25 -0700 (MST)
> Subject: Multi-class text classification in Weka
> I have a classified dataset (~10k) containing various sized strings, each
> which has been labelled according to its perceived sentiment. The three
> labels are 'Positive', 'Negative' and 'Neutral. 'Neutral' has been labelled
> so because it contains both positive and negative sentiment. I recognise
> this essentially makes this problem a multi-label problem and as such I was
> wondering how best to tackle this using Weka. I have attempted using some
> additional packages (including deeplearning Dl4jMlpClassifer and
> J48Consolidated) but have had little luck in achieving above ~75% accuracy
> in classification. In general the problem has been with the algorithms
> inability to determine whether a string is neutral and is as such
> mislabelling it dependant on what the algorithm determines to be it's 'most
> probable' class. I want to remove this 'most probable' factor and replace
> for a sort of sigmoid 1,-1 or anywhere at all in between. Any insight would
> be greatly appreciated. Thanks
> Sent from: http://weka.8497.n7.nabble.com/
I have a classified dataset (~10k) containing various sized strings, each of
which has been labelled according to its perceived sentiment. The three
labels are 'Positive', 'Negative' and 'Neutral. 'Neutral' has been labelled
so because it contains both positive and negative sentiment. I recognise
this essentially makes this problem a multi-label problem and as such I was
wondering how best to tackle this using Weka. I have attempted using some
additional packages (including deeplearning Dl4jMlpClassifer and
J48Consolidated) but have had little luck in achieving above ~75% accuracy
in classification. In general the problem has been with the algorithms
inability to determine whether a string is neutral and is as such
mislabelling it dependant on what the algorithm determines to be it's 'most
probable' class. I want to remove this 'most probable' factor and replace it
for a sort of sigmoid 1,-1 or anywhere at all in between. Any insight would
be greatly appreciated. Thanks
Sent from: http://weka.8497.n7.nabble.com/
first af all... Happy New Year :-)
I'm having some problems using the weka knowledgeflow environment.
In the attached word file you can see the flow I prepared.
It simply gets 2 distinct arff files as inputs: a training test and a test
set. Then a classifier is trained (cost sensistive classifier + bagging +
J48) and what I'm interested in is to have not only the predicted class,
but also the the prediction probabilities for each testset instance.
Here comes the problem: after the model is created, the flow is
interrupted, as you can see from the attached errorLog.txt file.
The strange thing is that the flow actually works when you start it the
first time, but if you start it a second time, it gets interrupted and
never works again.
I'm using Weka 3.8.1 and I have Java 1.8_u151 installed on my pc (with 64
bit Windows 7).
Could you please help me?
Thanks a lot.
I dont know if my question is stupid or no, but still I didnt get it well. I would appreciate the help of anyone. I am doing a comparison study between training and testing data.
If I will resample my data into training and testing (will have 2 files). Can I apply on each set separately the followings:- MultiFIlter selection in the pre-process panel- Classify in the Classify Panel using the cross validation
Will this work and will bias the results?
Thank you and sorry for the inconvenience.
I have a question if anyone can help please.
I am working on a project and I have a huge number of attributes (around 5000). Would it be fine to filter out the data to reduce the number of attributes and have a smaller subset (select the most related or top ranked genes say 2500 attributes) in the Pre-process panel. Thereafter, classify the results by the "FilteredClassifier" in the Classify panel.
The aim of the first stage filtering is to reduce the number of attributes and to help in building a comparison study. I mean as I have produced another subset of attributes (2500 attributes) to filter again and classify. I hope its clear.
in KnowledgeFlow when I use WrapperSubsetEval -----> embedded in--->
weka.classifiers.meta.FilteredClassifier who use
the best subset of attribute come from the evaluation (during the nested
cross validation) on the balanced sample (how I suppose...)? and the model
with the selected attribute is trained always on the the balanced sample and
then applied on the original sample?
Thanks fro the attention
Sent from: http://weka.8497.n7.nabble.com/
I was advised to use the lasso or elastic net method to reduce
attribute number for possible classification accuracy improvement.
I now used WEKA 3.8.
Can anyone kindly help me to identify the associated classifier?
I am wondering whether attribute selection approach that is applied in WEKA
can overcome the problem of multicollinearity.
I have two questions:
1- Are the selected features suffering from the multicollinearity?
2- Is there any way to detect the multicollinearity in WEKA
I hope any of WEKA developers can help me.