Hi all, i was wondering if would be possible to do a forecasting in weka but
instead of numeric data, you use string or text data, for instance: at
certain date you will receive a "red notification"
Thanks in advance
Sent from: http://weka.8497.n7.nabble.com/
Yes, they are the labels are the same and ordered in the same way.
> From: wekalist-bounces(a)list.waikato.ac.nz [mailto:email@example.com] On Behalf Of Mark Hall
> Sent: January-08-14 1:19 AM
> To: Weka machine learning workbench list.
> Subject: Re: [Wekalist] Difference between WEKA instance predictions and confusion matrix results?
> Are the headers of your training and testing ARFF files identical (I.e. is the order that the class labels are declared the same in both)?
I have an issue with understanding the following "predicted margin" (that
has nigative and postive values):
- When the predicted margin is a negative value what does that mean?
- When the predicted margin is a positive value what does that mean?
Thanks in advance.
I'm trying to train a classifier for classifying text as belonging to one
of two classes.
I have a training set that I'm able to open and verify is correct, in
explorer. I'm now trying to build and use the classify programmatically.
However, although the classifier is built, and I do get a distribution, I
always receive the same probabilities / score, no matter what input I give.
Here's my below code - it is scala, but its straightforward enough that you
should be able to see what its doing:
//Building the classifier:
val instances = new Instances(new DataSource("/my/dataset.arff").getDataSet)
val filter = new StringToWordVector
filter.setAttributeIndicesArray( (0 to 2).toArray )
val classifier = new FilteredClassifier
//Attempting to use the classifier:
val atts = new util.ArrayList[Attribute]
atts.add(new Attribute("sentence", true))
atts.add(new Attribute("parts_of_speech", true))
atts.add(new Attribute("dependency_graph", true))
val unlabeledInstances = new Instances("unlabeled", atts, 1)
unlabeledInstances.setClassIndex( 3 )
val instance = new DenseInstance(4)
val distrib =
No matter what input I give, the output of distrib is always:
There are 2 classes in my nominal - hence why it prints those values. But
no matter the input, the output is always the above.
Any ideas why this is, and what's going on? Would greatly appreciate any
I have a speech dataset that is divided into three subsets. There are approximately 90 attributes and the target is a numerical correlation value. I want to rank the attributes and have used the following:
Evaluator: weka.attributeSelection.WrapperSubsetEval -B weka.classifiers.functions.SMOreg -F 5 -T 0.01 -R 1 -E CORR-COEFF -- -C 0.0302 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Search: weka.attributeSelection.GreedyStepwise -R -T -1.7976931348623157E308 -N -1 -num-slots 1
When I run the attribute selection on each of the three speech subsets I get three very different ranked lists. I would have expected the rankings for the three subsets to be similar given that they are taken from the same overall speech dataset. Can anyone suggest possible reasons as to why the rankings are so different for each of the three speech subsets?
Also, is it possible when doing the ranking to output the correlation for each attribute individually? I would like to see the correlation for the individual attributes.
Regards and thanks,
T? an t-eolas at? le f?il sa r?omhphost seo faoi iontaoibh agus t? s? ceaptha le haghaidh aird an fhaighteora bheartaithe/na bhfaighteoir? beartaithe amh?in. M?s rud ? go bhfuair t? an r?omhphost seo go hearr?ideach, n? h?s?id agus n? tarchuir ? ar mhaithe le haon chusp?ir, le do thoil; ina ?it sin cuir ar an eolas muid l?ithreach agus scrios gach c?ip den r?omhphost seo ? do ch?ra(i)s r?omhaireachta. Ach amh?in sa ch?s gur comhaonta?odh a leith?id go sonrach ag ?r n-ionada? ?daraithe, is le h?dar an r?omhphoist amh?in na tuairim? a chuirtear in i?l ann, agus n? l?ir?onn siad tuairim n? n? chuireann siad ceangal ar aon chaoi eile ar Institi?id Teicneola?ochta Bhaile ?tha Luain. D?an teagmh?il le administrator(a)ait.ie n? cuir glao ar 090 6468000. The information contained in this email is confidential and is designated solely for the attention of the intended recipient(s). If you have received this email in error, please do not use or transmit it for any purpose but rather notify us immediately and delete all copies of this email from your computer system(s). Unless otherwise specifically agreed by our authorised representative, the views expressed in this email are those of the author only and shall not represent the view of or otherwise bind Athlone Institute of Technology. Contact administrator(a)ait.ie or telephone 090 6468000.
I am trying to implement an ensemble of classifiers in WEKA via repeated
downsampling of the majority class in my data set ( I have imbalanced data).
One approach I have tried is applying the RandomCommittee metaclassifier to
the FilteredClassifier metaclassifier with IKb as the base classifier,
according to the following scheme:
weka.classifiers.meta.RandomCommittee -S 1 -num-slots 1 -I 100 -W
weka.classifiers.meta.FilteredClassifier -- -F
"weka.filters.supervised.instance.SpreadSubsample -M 1.0 -X 0.0 -S 1" -W
weka.classifiers.lazy.IBk -- -K 1 -W 0 -A
"weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance
When I do this, I receive the following error:
Error: " Problem evaluating classifier. Base learner must implement
Apparently, WEKA is unable to pass the random seed from RandomCommittee to
the filter, SpreadSubsample. Can you suggest a workaround, please?
Thanks very much for your help!
Sent from: http://weka.8497.n7.nabble.com/