yesterday Peter posted how to merge two values of an attribute.
Now I wonder how to change the name of the resulting value?! (perferable
in the explorer)
E.g. Suppose my classes had the name class_1 and class_2. After merging
them the final value/class has the name class_1_class_2. However, I want
it to have the name class_3.
Sorry if people are aware of this and also for repeating something I had
said in a previous post. But I though t was worth emphasising and I have
done a little more investigating.
I have run the EM algorithm on a PC laptop with Intel chip (Pentium M
1.6GHz) running windows and an Apple iBook G4 with a G4 PowerPc
processor (about 1.4GHz) running OSX 1.4. Both computers have Java 1.6
I have done 3 runs. 2 runs with different seeds (but the same on each
computer) and the option on the EM algorithm to automatically select the
number of clusters. I have run on the same data. I have also done one
run on each computer with the same seed on both machines and same data,
but with the number of clusters set to be 3. In all three cases
different results were obtained on PC and iBook. The other parameters
for EM were defaults. The seeds I used when I set the number of clusters
to be -1 (ie choose the number of clusters) were 42 and 43. When I set
the number of clusters to be 3 I used a seed of 100. The data was my own
data set. Note I did not normalise the data range eg -1 to +1 or zero
mean unit SD.
I have also done a run of the X-means algorithm on both computers with
the same data set and identical results were obtained from each
computer. The parameter values were:
Bin value 1.0
Cutoff factor 0.5
Max iterations 1000
Max k means 1000
Max number of children 1000
Max number of clusters 1000
Min number of clusters 1
Use KDtree false
Metric used was normalised Euclidian
I have not had chance for a wider investigation eg checking if the other
clustering algorithms produce different output or checking if EM
produces different output on standard data sets eg from UCI repository.
Apologies in advance for posting a question that I'm sure is obvious to
experienced Weka users, but I'm wondering if one can improve the
predictive power of a model by extracting the mis-classified instances
and training a new model against them? More precisely, I used the J48
classifier applied to a set of 70,000 instances (11 attributes) and was
able to correctly predict about 82% of the cases. I'm wondering if I
take the remaining 18% and try a different classifier if I can improve
the overall result? If so, can this sort of thing be implemented in the
How do Naive Bayes and J48 handle missing values in the weka system ?
Is there a general default for handling missing data, for all algorithms,
or is it algorithm-specific in the weka system?
Envoyé avec Yahoo! Mail.
Plus de moyens pour rester en contact.
> I correct assuming that the analysis you describe needs to be conducted
> at the command line and not in the Explorer?
Doesn't matter. This setup can be used in the Explorer, Experimenter,
KnowledgeFlow or commandline.
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ +64 (7) 838-4466 Ext. 5174
I am new in the field of research of text classifcation..
I am using Weka for preprocessing reuters dataset..
I am stuck in between..
My problem is.......
I took a single file from reuters and just copied data in between <reuters> and </reuters> and tokenized it stemmed it and kept as a bag of words in csv format.I loaded it to weka 3-5-7 ..it was showing out of memory error..From some of the post i got that i can increase the heapsize..and i have done it..
The thing is that i cannot remove stopwords from the data which i gave..
Should i change StringToWord vector function...
Hoping to get reply soon
I was wondering what's the best way to split a data set into training
(67%) and test set (33%) so that the original class distribution is
preserved and that there's no overlap between training and test set.
So far I've used the TrainingSetMaker and TestSetMaker in the knowlege
flow, but I'd be interested how to do this via the api.
thanks a lot for your help
I would like to know if it is possible to add features to an exisiting
I would like to compute some features in advance since their computation
is quite expensive. Later-on I want to add these features to other
features, where the instances are in the same order.
Is there an easy way, or do I need to parse the arff file myself?
Thanks in advance,
Dipl.-Bioinf Sebastian Briesemeister
Eberhard Karls University Tübingen
Wilhelm Schickard Institut for Computer Science
Division for Simulation of Biological Systems
Room C304, Sand 14, D-72076 Tübingen
phone: +49 (0) 7071 29 70437
fax : +49 (0) 7071 29 5152