On Mon, 17 Mar 2003 Ehtesham.Haque(a)asu.edu wrote:
1) If number of clusters are not specified, the
algorithm generates the number
by itself - is that correct?
Which algorithm are you referring to? COBWEB will select the number of
clusters (and build a hierarchy), while EM and k-means require the
specification of the desired number of clusters. Apparently the Weka
implementation of EM (in Weka 3-2, at least) will use cross-validation to
select the number of clusters if it is not specified (I don't know if
2) "Random number seed" - could you
elaborate on its role.
A random number seed specifies a seed for the random number generator.
This allows you to replicate a "random" run (specifying the same seed will
give you the same series of random numbers).
3) The method 'buildClusterer' - could you
give more detail of its
responsibilities. When the documentation says "Generates a clusterer," does it
mean it builds a single cluster or clusters from the training data set.
It looks like this builds the set of clusters (or hierarchy) from the
training data set. (Caveat: I didn't write this code, so I'm just looking
at the documentation.)
If you haven't already, I recommend reading a high-level overview of
clustering before diving in; although having the methods provided as a
library makes it tempting to use them as black boxes, it will be easier to
interpret the results if you're familiar with the algorithms. If you're
already a clustering expert, ignore this paragraph :)
Kiri L. Wagstaff, Ph.D. (kiri.wagstaff(a)jhuapl.edu) \|/
Science Applications Group, Space Department -O-
The Johns Hopkins University Applied Physics Lab /|\