next
up
previous
contents
index
Next: Evaluation of clustering
Up: Problem statement
Previous: A note on terminology.
Contents
Index
A difficult issue in clustering is determining the
number of clusters or
cardinality of a clustering,
which we denote by $K$. Often $K$ is nothing more than a good guess based
on experience or domain knowledge. But for $K$-means, we will
also introduce a heuristic method for choosing $K$ and an
attempt to incorporate the selection of $K$ into the
objective function. Sometimes the application puts
constraints on the range of $K$. For example, the
Scatter-Gather interface in Figure
16.3 could not
display more than about $K=10$ clusters per layer because of
the size and resolution of computer monitors in the early 1990s.
Since our goal is to optimize an objective function,
clustering is essentially a search problem. The brute force
solution would be to enumerate all possible clusterings and
pick the best. However,
there are exponentially many
partitions, so this approach is not
feasible.[*] For this reason, most flat
clustering algorithms refine an initial partitioning
iteratively. If the search starts at an unfavorable initial
point, we may miss the global optimum. Finding a good
starting point is therefore another important problem we
have to solve in flat clustering.
next
up
previous
contents
index
Next: Evaluation of clustering
Up: Problem statement
Previous: A note on terminology.
Contents
Index
© 2008 Cambridge University Press
This is an automatically generated page. In case of formatting errors you may want to look at the PDF edition of the book.
2009年04月07日