SimpleKMeans applies the same EuclideanDistance and ManhattanDistance
classes as the nearest-neighbour learner IBk.
For two numeric attribute values x and y, the value of x-y is used in
the distance calculation. For two nominal attribute values x and y, 0 is
used when the two values are the same, and 1 is used when they are
This only makes sense when numeric attributes have been rescaled
("normalized") to the [0,1] interval. EuclideanDistance and
ManhattanDistance both do this by default.
On 25/03/2014 12:54, Mark Polczynski wrote:
I understand how to use Euclidean distance to assign
clusters for numerical attributes, and how to use the number of correct
binary values to assign instances to clusters for nominal attributes,
but how does simple k-means assign instances to clusters when instances
have both numeric and nominal attributes?The number of “hits” on the
nominal attribute is a measure of closeness to a nominal cluster center,
except that the larger this value the closer an instance is to the
cluster center, but how would this measure be combined with a Euclidean
distance for numerical attributes to find the closest cluster?
Wekalist mailing list
Send posts to: Wekalist(a)list.waikato.ac.nz
List info and subscription status: http://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html