[Home] HierarchicalClusterDetection

MeatballWiki | RecentChanges | Random Page | Indices | Categories

I'm sorry if this sounds obtuse. I'm having a beer inspired idea, so I just want to write it down. -- SunirShah

Supposition.

Assume that you have a planar graph, and each node has a weight.

: Problem 1. How do you weight the graph?

You can detect clusters (based on those weights) in that graph by calculating the gradient of the surface dependent on the weight as the variable. More simply, any node with an adjacent neight of a higher weight would simply disqualify itself as being a cluster root. This naive algorithm will find all the local maxima in the weighted graph.

However, irregularities in the slope will create weak local maxima that aren't relevant. So, instead, what we do is build a graph of the selected maxima from the previous stage and repeat the algorithm.

: Problem 2. How do you construct the next graph stage?

Repeated applications of this algorithm will build layers of maxima. From the layers, we can build a tree. Each non-maxima in a given stage becomes a child of the closest maxima of that given stage.

: Problem 3. Should we give preference to the centre of the maxima subgraph?

: Problem 4. Given a weighting function that isn't totally-ordering, we may end up with multiple maxima of equal weight at some stage. What do we do then?

This will dynamically generate a cluster hierarchy from the graph which can be used to aid navigation, say through a TableOfContents or a TreeMap.

: Problem 5. Noise in the weight function may make the naive maxima/minima distinction fuzzy. Or say the weight difference is 0.62 to 0.61. The error in the values may be larger than the difference. Would taking the second-order del of the hyperplane be sufficient to detect overall trends in the slope? But then again, what is the dimensionality of the hyperplane? (I suppose the minimal degree it takes to create a (hyper)planar graph?)

Problem 1. How do you weight the graph?

There are several ways to weight the graph. The weighting choice is really a reflection of what information you want to present. For instance, if one merely wants to limit the number of page fetches to get around the site, using average distances would be sufficient ala ShortestPathPages or even AccumulatedRandomWalk. On the other hand, to more accurately reflect the community's feelings, one could use the VisitorWeight (as a DynamicValue, naturally). If one wanted to find hot spots of discussion, one could use the AuthorWeight as well.

Problem 2.

Presumedly, at each cluster level, one would want to reduce the dimensionality of the hyperplane by (at least) one. However, this is too difficult to consider. The naive implementation might be to connect to your nearest neighbour, but if you weight backlinks equally, your nearest neighbour's nearest neighbour is you. Speaking of which,

: Problem 6. Should you weight BackLinks as equal as ForwardLinks?

Really, your neighbouring peak is the neighbour of your valley. I guess each maximum could find its neighbouring minima, and then find its neighbour maxima through the minima.

: Problem 3. Should we give preference to the centre of the maxima subgraph?

: Problem 4. Given a weighting function that isn't totally-ordering, we may end up with multiple maxima of equal weight at some stage. What do we do then?

I don't think this is really necessary, unless you get to Problem 4. and you really need a root. On the other hand, if you use a TreeMap, this doesn't matter. You can make a metaroot that contains all the remaining maxima and then just render from there.

Problem 5. Noise in the weight function.

This a big problem if you want accurate results, but we really just want usable results, so it might not mean much in the grand scheme of things. Another option is to round the weight values to the appropriate number of significant digits as an intermediate phase.

Problem 6. Should you weight BackLinks as equal as ForwardLinks?

Probably not. But it would be even nicer to use the VisitingLink weights.

CategoryGraphTheory

[Home] HierarchicalClusterDetection

Discussion