Showing posts with label complex networks. Show all posts
Showing posts with label complex networks. Show all posts
15 January 2009
The synonym-following game
An interesting random fact: form a graph where the vertices are words that the dictionary says has at least one synonym, and the edges are synonym pairs. Then the resulting graph has a giant component. In particular, if "the dictionary" is Merriam-Webster's dictionary, there are 23,279 words that have at least one synonym, and the resulting graph has a component of size 22,311. It also has a clustering coefficient of 0.7. The clustering coefficient is the probability that if we pick a vertex (word) u uniformly at random, and then pick two of its neighbors (synonyms) v and w uniformly at random, then v and w are neighbors (synonyms). So it's not surprising this is high for the dictionary network. This seems consistent with synonyms being words that are "near" each other in some "semantic space". I'm also kind of curious if the results are different for different dictionaries; a dictionary that's less aggressive in declaring things "synonyms" might not show this behavior, and in particular I suspect there's a critical point at which small perturbations of aggressiveness lead to large perturbations in the size of the giant component. So if you've ever played that game of following synonyms in a dictionary and ending up at words that seem to have nothing to do with where you started, this is why.
I'm paraphrasing this from "Statistical mechanics of complex networks", by Reka Albert and Albert-Laszlo Barabasi (cond-mat/0106096); apparently it comes from an unpublished manuscript of Yook, Jeong, and Barabasi. (The article, from 2001, called it a "preprint" but I can't find anything with that set of authors that fits the description. Also, does anybody else find the habit of not including titles of articles in citations supremely annoying? There are actually two "preprints" by that three-author set cited in this article, both from 2001; these are distinguished only as "2001a" and "2001b".) If you actually point me to this paper (or a similar study done by someone else) I'll appreciate it and will publicly thank you.
I'm paraphrasing this from "Statistical mechanics of complex networks", by Reka Albert and Albert-Laszlo Barabasi (cond-mat/0106096); apparently it comes from an unpublished manuscript of Yook, Jeong, and Barabasi. (The article, from 2001, called it a "preprint" but I can't find anything with that set of authors that fits the description. Also, does anybody else find the habit of not including titles of articles in citations supremely annoying? There are actually two "preprints" by that three-author set cited in this article, both from 2001; these are distinguished only as "2001a" and "2001b".) If you actually point me to this paper (or a similar study done by someone else) I'll appreciate it and will publicly thank you.
Subscribe to:
Comments (Atom)