>Weka is not the typical training program for image classification.
>There are several problems which are not so easy to solve with Weka:
Yes, but there are Java libraries for preprocessing images such as
ImageJ, which are as fast as dedicated C routines. I recently built
an industrial image classification system with these, plus JNI to
get camera images from a linux-uvc camera - because people won't
write kernel drivers in Java - yet. ;-)
I don't think that you will get good results just classifying the
pixel data. Even in handwritten digit recognition, you need to
scale, brightness-equalize, downsample and deslant your digits
before running them through SVM. For this, a lot of basic image
classification code is needed (first/second moments, downsampling
and equalization routines, ...) and you may prefer testing
keypoint vectors (e.g. by SIFT) or output from edge detectors (such
as the Canny edge detector) as features.
Also, I'd rather people _not_ contribute to the myth that Java is
slower than C - if properly programmed and optimized, Java is quite
in the same league, minor differences nonwithstanding. There are
optimized SVM versions (such as svm_perf) which are not yet
available in Java, but these could easily be ported and would be
equally fast. It usually happens that I have more problems running C
learning algorithms than Java learning algorithms on large datasets,
because most of them still have lots of 32bit pointers and need
patching to utilize more than 4GB of memory.
BTW I've a recent paper on using WEKA in image mining for static
analysis of mobile phone camera images to reconstruct go board
positions: see http://alex.seewald.at, Publications, for details.
Dr. Alexander K. Seewald +43(664)1106886
Information wants to be free;
Information also wants to be expensive (S.Brant)
--------------- alex.seewald.at ----------------
I have an huge dataset (it can contain 350.000 - 2.000.000 records) on which it's not possible to use directly WEKA, so I need to extract a representative sample. To avoid problems with nominal attributes I need to sample it keeping all possible values for each nominal attribute.
Do you know any way to do that ? Is it available any WEKA's tool or any third party tool, or the only way is to write an application for it ?
Thank you again
I am having a doubt, as to how good is n fold cross validation?(n is the no. of instances).I did both 10 fold and n fold on my data and n fold gave better results however when i removed the lowly correlated attributes from the data 10 fold gave better results than n fold.
I did the same with many attributes but results for n fold were erratic. Can someone explain as to what should i use?
Why delete messages? Unlimited storage is just a click away.
1) Is there any way I can view the tree generated using Knowledgeflow using RacedIncrementalLogitBooster?
2) Am I correct in understanding that I cannot save the model if I use Knowledgeflow but once, run I can pipe another Arff dataset as future classification, within the same session?
3) I have a very large dataset and not enough memory. Is incremental classification the only option I have?
Thank you in advance.
I haven't been able to make MetaCost work either. Could it be the cost
matrix values are hard to set right (does it normalize the given values?).
For example, I've tried using the base classifier's confusion matrix as
cost matrix but to no gain. Or else the number of iterations which in other
iterative classifiers (e.g. boosting or neural networks) is known to be
difficult to estimate. In MetaCost, you also need to consider bag sample
size (100 does not always work best I've found).
Base classifier selection for any iterating algorithm is most crucial. If
you select a strong, already in many ways optimized classifier like SVM, or
heavily biased one like NB (probably Logistic too), you will tend to get
the same result before and after cost application. Prefer decision trees
(in trees/rules packages).
Also consider boosting (meta.AdaBoost) which performs the same basic
operation of penalizing misclassifications but does that on instances of
classes rather than all instances on some classes specified in cost matrix.
(Otoh, cost classifiers allow you to set a different penalty for FP and FN
of a particular class.)
Lainaus Fernando Cela Diaz <fcela(a)sloan.mit.edu>:
> Use weka.classifiers.meta.CostSensitiveClassifier with the classifier you
> want to use.
> -----Original Message-----
> From: "Sean Little" <seanlitt(a)gmail.com>
> To: wekalist(a)list.scms.waikato.ac.nz
> Sent: 11/28/2007 9:08 PM
> Subject: [Wekalist] cost function in WEKA
> To anyone who might be able to help,
> This is my first posting on this group, so I apologize in advance if
> is something I am doing that is not according to protocol.
> I am using weka for a class project. It is very important for my problem
> that I incorporate a cost function. False negatives are very much more
> costly than false positives. I tried to change the "cost sensitive
> evaluation" matrix in weka, but there are no apparent changes in the
> results. I am trying to use the logistic classification algorithm.
> Is there something that I am doing wrong?
> When the logistic algorithms starts, it prints the matrix that I
> but the results are exactly the same as when I don't enter a cost.
> Sean Little
> Wekalist mailing list
yst. terv | Best wishes
Search for "use weka in your java code" in the weka wiki -- it provides a very good tutorial.
From: "Soledad Zubiri" <szubiri(a)gmail.com>
Sent: 11/28/2007 12:44 PM
Subject: [Wekalist] KNN
Hi, I'm Soledad.I am I looking for any example of clasifiying with knn (or
naive bayes also wold help). If anyone have seen anything in java code I
woul be gratefull.
Recently I have been doing an experiment by using Weka to cluster web user
Now I need a data set consists of user sessions that are from a website's
weblogs. The format of MSNBC data set can satisfy my requirement but it just
contains 17 distinct pageviews(categories), which is a very small number.
Is there any other data set that is similar to MSNBC but with much more
Thank you very much for your help.
I am using Weka to do the simple K-Means cluster on Windows XP.
My dataset contains about 2000 instances and each instances has 5000
attributes. I want to cluster these instances into 200 clusterings, but when
I used the simple K-Means to cluster, after Weka runs several minutes, it
ends with an error: OutOfMemory. It also gave some information about the
Initial JVM size:4.9MB
total memory used:127.1MB
max memory used:127.1MB
So how can I set a bigger memory for Weka to run the clustering algorithm?
By the way, how can I configue Weka so that the Simple K-Means can output
the instances for each clustering?
Thank you very much!