I'm trying to use a classifier (SimpleKMeans or DBScan) in a java program:
- Three attributes: Latitude, Lontigude, Date;
- Attributes used for Clustering: Latitude, Longitude;
- Ignored Attributes: Date;
- Results Expected: #Cluster, Lat, Long, Date
I use the "Ignore Attributes" button in the GUI, but how can I do this
through the java API?
Thanks a lot,
I perform text classification using NaiveBayes on data processed by
StringToWordVector. I wanted to keep them (or FilteredClassifier)
serialized (in a file or in a database) between sessions. Unfortunately
I'm unable to serialize a filter (StringToWordVector). I've got (tested
with different data):
Exception in thread "main" java.io.NotSerializableException:
For test I use simple stream combination (which seems to be ok):
ObjectOutputStream oos = new ObjectOutputStream(
Is there any problem with StringToWordVector serialization or I do
something wrong in my code?
Why Boosting SVM ( applying AdaBoostM1 into SMO in weka ) is not a good idea and not effective as boosting a Decision Tree ? ( i know that SVM is a strong classifier whereas DT is not )
Could anyone Please explain this to me or provide me with good references ( i have searched the net and most of references i read talking about boosting SVMs by ensemble them ) ?
Yahoo! oneSearch: Finally, mobile search that gives answers, not web links.
I would like to use grid search and cross validation to find the best parameters of C and gamma for RBF kernel in LIBSVM by using WEKA Explorer. Can someone tell me how I could do it?
I see that there are gridsearch and CVparamater selection. Which one shall I use and how?
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us.
Lainaus qusai shambour <qusai_79(a)yahoo.com>:
> Dear ALL,
> Im interested to understand the Bias-Variance trade off of different
> ensemble classifiers
> (e.g. bagging, boosting, stacking, voting). I have 15000 records as
> training dataset
> and 5000 records as test dataset. I know that BVDecompose.java can
> calculate these values But:
> 1. What is the command to do, for example, the stacking (with linear
> regression as meta classifier and
> boosted J48 + NB as base classifiers) function?? It worked for me with
> one single classifier!!
Just separate the structural elements you want to add (Stacking -> LinReg ->
Boosting) with "--" delimiter and it should work.
At least this works for me:
java weka.classifiers.BVDecompose -T 75 -t ~/data/UCI/iris.arff -W
weka.classifiers.meta.Bagging -- -W weka.classifiers.trees.J48
> 2. Can I add my Test dataset to the command and how??
It seems this is yet one of the methods in Weka that does not allow
provision of separate test file. You could always merge train and test
files and I suspect would get the same effect. However, I cannot confirm how
this Weka method picks training instances, first ones specified by -T
parameter or using random selection. Though in the end, this may not much
matter in terms of analysing bias and variance.
> 3. My training dataset contains 15000 records, so what the optimal size
> of Training Pool should be?
I guess the generally accepted rule would be to use 2/3 for training and 1/3
for testing but it seems (for reason I do not know) that in BVDecompose you
can only set the training pool to half of the provided instances in the
training file. With this constraint I would use maximum allowed (-T 7500 for
your training file and -T 10000 for the merged file) or as much as you
estimate a model's BV analysis would require to be valid, since variance
tends to diminish with training and bias probably could be defined with a
subset of your training file.
best, Harri S
> Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's
> on, when.
yst. terv | Best wishes
I don't think it makes sense to have an updatable StringToWordVector
filter. If you add a new token (attribute) to the dataset, you need
to reprocess all previous mails and retrain the classifier on the
whole previous dataset - unless WEKA comes up with an attribute-updatable
classifier, that is. ;-) Best is the run the StringToWordVector filter
on training and run it unchanged (batch-mode, -r and -s) on the test
data, e.g. from command line like this:
java weka.filters.unsupervised.attribute.StringToWordVector -i train.arff
-o train-wv.arff -b -r test.arff -s test-wv.arff
You can serialize, store and load this pretrained StringToWordVector
without problems - I've done this before for BioMinT. Still, don't
run StringToWordVector on your test data, as this will positively
bias your results. Use independent data not used to test the
classifier (using the same data for training & word vector is ok)
Dr. Alexander K. Seewald +43(664)1106886
Information wants to be free;
Information also wants to be expensive (S.Brant)
--------------- alex.seewald.at ----------------
I use Weka for text classification (in fact spam messages
classification) and it works ok. Due to a quite large training set size
it's not very handy to rebuild a classifier on an every new message (or
even once per n messages). To convert text input into numeric vectors I
use StringToWordVector filter which isn't updateable. I didn't find any
other filter which is updateable and have similar functionality.
I would like to ask is it possible to perform updateable text
(or should I rather leave that idea and change (somehow) program's
assumptions to use only built from scratch classifier?)
Thanks for your answers
I have just installed Weka-3.5.6 on my computer and would like to use LIBSVM as a classifier. Following the notice, "A wrapper class for the libsvm tools (the libsvm classes, typically the jar file, need to be in the classpath to use this classifier).", I put libsvm.jar in the classpath. Please see the attached screen shot. However, I am still getting the error message "libsvm classes not in CLASSPATH".
Please let me know how I can fix this problem.
M. Fatih Akay
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.