Showing posts with label Flajolet. Show all posts
Showing posts with label Flajolet. Show all posts
23 March 2011
03 January 2009
Things I saw at ANALCO '09
1. a picture of a Rubik's cube subwoofer. (They're only making 200, so I could never afford one.) This was actually in the magazine in my hotel room, so it's really just a coincidence.
2. An actual copy of Flajolet and Sedgewick's book Analytic Combinatorics. (Cambridge University Press had a display table.) I was not the only person who didn't believe it was actually a book. (In fact, Sedgewick seemed a bit surprised.) It appears to be very nice as a physical object, and in particular is surprisingly un-brick-like for an 800-page book.
3. A bunch of talks, many of which were interesting. I may have more commentary on the ones I found particularly interesting once I read the associated papers.
2. An actual copy of Flajolet and Sedgewick's book Analytic Combinatorics. (Cambridge University Press had a display table.) I was not the only person who didn't believe it was actually a book. (In fact, Sedgewick seemed a bit surprised.) It appears to be very nice as a physical object, and in particular is surprisingly un-brick-like for an 800-page book.
3. A bunch of talks, many of which were interesting. I may have more commentary on the ones I found particularly interesting once I read the associated papers.
Labels:
conferences,
Flajolet,
Rubik's cube,
Sedgewick
31 December 2008
Best-seller encore impatiemment attendu
From Robert Sedgewick's web site: slides for a talk entitled Impatiemment Attendu, given at a conference in honor of Phillipe Flajolet's 60th birthday. The gist of this talk appears to have been something like "The book will be out soon, after about thirty years of gestation." (That's Analytic Combinatorics, in case you're wondering.)
I mention it because apparently, at the beginning of this month in Paris, this post I made in September was projected on a big screen in front of a bunch of important people. I am of course amused. (The title "impatiemment attendu" is not mine, though; I took it from a paper by Nicolas Pouyanne. I suppose the English translation "impatiently awaited" is mine, but this was not a translation that required some huge inspiration.)
At the time, Amazon said that the book would be out on December 31. It's December 31. It's not out yet, as far as I know. I'll be at ANALCO '09 on Saturday. A slide in the presentation says that the book will be available at SODA (ANALCO takes place the day before); maybe the book will be there?
I mention it because apparently, at the beginning of this month in Paris, this post I made in September was projected on a big screen in front of a bunch of important people. I am of course amused. (The title "impatiemment attendu" is not mine, though; I took it from a paper by Nicolas Pouyanne. I suppose the English translation "impatiently awaited" is mine, but this was not a translation that required some huge inspiration.)
At the time, Amazon said that the book would be out on December 31. It's December 31. It's not out yet, as far as I know. I'll be at ANALCO '09 on Saturday. A slide in the presentation says that the book will be available at SODA (ANALCO takes place the day before); maybe the book will be there?
26 February 2008
Combinatorial quotients
Consider a combinatorial class (in the sense of Flajolet and Sedgewick) or combinatorial species (in the sense of Bergeron, Labelle, and Leroux). There are certain generating functions associated with combinatorial classes/species (I'm conflating the two for the moment because the difference between the two theories aren't important here), and the operations of sum and product have nice "combinatorial" interpretations -- addition of generating functions corresponds to disjoint unions of classes/species and multiplication of generating functions what I'll call here "combinatorial product", i. e. for two species F and G, an FG-structure on the set [n] is a pair which consists of an F-structure on some subset S of [n] and a G-structure on [n]-S. Then the generating function of the FG-structures is the product of the generating function of the F-structures and that of the G-structures.
Other natural operations on functions have combinatorial interpretations: composition of functions, taking exponentials or logarithms, differentiation, integration, etc. Even subtraction has an interpretation, although one has to be careful: this is the "virtual species". (A virtual species is a pair of ordinary species (F, G), written as F-G, where (F, G) = (H, K) if F + K = G + H. Replacing species with integers and addition with multiplication, this looks like the construction of the rational numbers from the integers.)
But is there a natural quotient operation? If there is, I can't find a mention of it in Flajolet and Sedgewick, and Marko Petkovsek (who I was just talking about this with) tells me it's not in Bergeron et al. either; this is one of the sources he's using for our current topics course in combinatorics. Let F be a species with 1 structure on the set of size 0, and let Fn be its restriction to size n. Let F+ be its restriction to all positive sizes. Then
where 1 is the multiplicative identity species (hence the name); it has one structure on a set of size 0. We can of course write this as
and we collect the "positive" and "negative" terms to get
So the inverse of the species F exists, and it's a virtual species in which the positive part is even-length tuples of nonempty F-structures, and the negative part is odd-length tuples of nonempty F-structures. This doesn't quite seem like a "combinatorially natural" operation -- but on the other hand I was able to describe it in a single sentence, so it seems plausible that it could be a natural thing to occur somewhere.
Notice that this also proves that if
and F(z)G(z) = 1, f0 = 1, then all the gn are integers, since the generating function G(z) actually counts something! (Well, a virtual something -- but that's close enough. It won't surprise you to learn that for two species H and K, the generating function corresponding to the virtual species H-K is H(z) - K(z). G here is a virtual species, so the gn are (possibly negative) integers. Of course, there are other ways to prove this, but I think this is a cute "combinatorial proof".
Other natural operations on functions have combinatorial interpretations: composition of functions, taking exponentials or logarithms, differentiation, integration, etc. Even subtraction has an interpretation, although one has to be careful: this is the "virtual species". (A virtual species is a pair of ordinary species (F, G), written as F-G, where (F, G) = (H, K) if F + K = G + H. Replacing species with integers and addition with multiplication, this looks like the construction of the rational numbers from the integers.)
But is there a natural quotient operation? If there is, I can't find a mention of it in Flajolet and Sedgewick, and Marko Petkovsek (who I was just talking about this with) tells me it's not in Bergeron et al. either; this is one of the sources he's using for our current topics course in combinatorics. Let F be a species with 1 structure on the set of size 0, and let Fn be its restriction to size n. Let F+ be its restriction to all positive sizes. Then
F^{-1} = (1 + F_+)^{-1}
where 1 is the multiplicative identity species (hence the name); it has one structure on a set of size 0. We can of course write this as
1 - F_+ + F_+^2 - F_+^3 + F_+^4 - \cdots
and we collect the "positive" and "negative" terms to get
So the inverse of the species F exists, and it's a virtual species in which the positive part is even-length tuples of nonempty F-structures, and the negative part is odd-length tuples of nonempty F-structures. This doesn't quite seem like a "combinatorially natural" operation -- but on the other hand I was able to describe it in a single sentence, so it seems plausible that it could be a natural thing to occur somewhere.
Notice that this also proves that if
and F(z)G(z) = 1, f0 = 1, then all the gn are integers, since the generating function G(z) actually counts something! (Well, a virtual something -- but that's close enough. It won't surprise you to learn that for two species H and K, the generating function corresponding to the virtual species H-K is H(z) - K(z). G here is a virtual species, so the gn are (possibly negative) integers. Of course, there are other ways to prove this, but I think this is a cute "combinatorial proof".
Labels:
Bergeron,
combinatorics,
Flajolet,
generating functions,
Labelle,
Leroux,
Petkovsek,
Sedgewick
24 November 2007
A simple probability puzzle,
A simple probability puzzle, from Doubting Toshuo via Anarchaia.
Paraphrasing a bit: imagine we flip a fair coin until either THH or THT appear on three consecutive flips. Which is more likely to come up first? Alternatively, which has the smaller expected time to its first appearance? (These are different questions; the first is "which wins the race" and the second is "which has the faster average time". They don't necessarily have the same answer in general.)
The answer to both is THH. This is analyzed in Flajolet and Sedgewick's forthcoming book Analytic Combinatorics, section I.4.2. If you want to figure out either the probability that THH appears before THT or the distribution of the first time at which they appear, it's a basically routine problem in the combinatorics of words.
Most people say that they're equally likely to come up first, and that they have the same expected time to the first appearance. This is because they're confusing this problem with a closely related problem: which pattern appears more often in the long run? And both patterns have probability 1/8 of appearing in the positions n through n+2 for any n; the expected value of the number of appearances of THH, or of THT, which start by position n is in fact n/8. But patterns like THT, which can overlap themselves (the technical term is "autocorrelation"), tend to occur in clusters; for example, we can get three occurrences of THT in just seven coin flips: THTHTHT -- but to get three occurrences of THH takes at least nine coin flips. As Flajolet and Sedgewick write:
They use different patterns, but the same effect occurs for their patterns as for Toshuo's. There might be a halfway decent bar bet here; unfortunately most of the people I would make a bar bet with are my friends, and my friends know I'm a probabilist, so they'd be suspicious. Especially since I've written about this here.
Paraphrasing a bit: imagine we flip a fair coin until either THH or THT appear on three consecutive flips. Which is more likely to come up first? Alternatively, which has the smaller expected time to its first appearance? (These are different questions; the first is "which wins the race" and the second is "which has the faster average time". They don't necessarily have the same answer in general.)
The answer to both is THH. This is analyzed in Flajolet and Sedgewick's forthcoming book Analytic Combinatorics, section I.4.2. If you want to figure out either the probability that THH appears before THT or the distribution of the first time at which they appear, it's a basically routine problem in the combinatorics of words.
Most people say that they're equally likely to come up first, and that they have the same expected time to the first appearance. This is because they're confusing this problem with a closely related problem: which pattern appears more often in the long run? And both patterns have probability 1/8 of appearing in the positions n through n+2 for any n; the expected value of the number of appearances of THH, or of THT, which start by position n is in fact n/8. But patterns like THT, which can overlap themselves (the technical term is "autocorrelation"), tend to occur in clusters; for example, we can get three occurrences of THT in just seven coin flips: THTHTHT -- but to get three occurrences of THH takes at least nine coin flips. As Flajolet and Sedgewick write:
On the other hand, patterns of the same length have the same expected number of occurrences, which is puzzling. The catch is that, due to overlaps of [pattern 1] with itself, occurrences of [pattern 1] tend to occur in clusters, but, then, clusters tend to be separated by wider gaps than for [pattern 2]; eventually, there is no contradiction.
They use different patterns, but the same effect occurs for their patterns as for Toshuo's. There might be a halfway decent bar bet here; unfortunately most of the people I would make a bar bet with are my friends, and my friends know I'm a probabilist, so they'd be suspicious. Especially since I've written about this here.
03 September 2007
data mining, or, Shakespeare was a combinatorialist
A rant on the virtues of data mining (from Statistical Modeling, Causal Inference, and Social Science) and "Data Mining" = voodoo science (from The Geomblog).
Aleks at Statistical Modeling, etc. says that he does data mining, and that social scientists are not happy to see this and say things like "that's not science". (Incidentally, I tend to be skeptical of any discipline that has "science" in it's name. If you have to tell everyone you're a science, perhaps you're trying to hide something. It's like restaurants that serve "authentic Chinese cuisine".)
Now, I don't know a huge amount about data mining techniques. And it seems to me that it would be irresponsible to do the following:
This is silly because at the 95% confidence level, one expects that you'd get a hit one time out of twenty purely by chance. I sincerely hope that if the people doing this stuff do things analogous to what I just said, they use some much higher confidence level.
Data miners also seem to draw a distinction between exploratory and confirmatory data mining. They might use a technique like the one I just said (although more sophisticated) to find hypotheses that are worth looking at and studying in more detail. This, of course, is something we all do every day -- we look around and when things are interesting we look at them more closely. We cannot look at everything very closely because then our brains would explode.
The trick here, I suppose, is knowing how to distinguish signal from noise. My favorite example of the moment is Example I.11 of Flajolet and Sedgewick's Analytic Combinatorics (link goes to 12 MB PDF of the book). They tell us that the pattern "combinatorics" is hidden in the text of Hamlet, which begins as follows:
Who's there?
Nay, answer me: stand, and unfold yourself.
Long live the king!
Bernardo?
He.
You come most carefully upon your hour.
'Tis now struck twelve; get thee to bed, Francisco.
For this relief much thanks: 'tis bitter cold,
And I am sick at heart.
Have you had quiet guard?
Not a mouse stirring.
Well, good night.
If you do meet Horatio and Marcellus,
At this point we look at the bolded letters, jump up and down and say that Shakespeare was trying to tell us something, despite the fact that Shakespeare had never heard the word "combinatorics". But Shakespeare is also a Yankees fan (see the italicized letters) and went to Harvard (see the underlined letters), so we ought to be suspicious. It turns out that Hamlet contains 1.63 x 1039 instances of the word "combinatorics", whereas a random sequence of letters chosen uniformly at random from the English alphabet and of the same length as Hamlet contains on average 6.96 x 1037 such sequences, and a random sequence of letters chosen at random with the same distribution as normal English text contains on average 1.71 x 1039 such sequences. So all we can conclude, it seems, is that maybe Shakespeare was writing in the same language that the word "combinatorics" is taken from. (One could try to compute the standard deviation associated with that 1.71 x 1039 figure, but it's silly because English text is not created by reaching into a Scrabble bag over and over again.) Of course there are less frivolous examples -- one of them is looking at whether certain patterns that appear in the human genome occur more or less often than you'd expect by chance.
Aleks at Statistical Modeling, etc. says that he does data mining, and that social scientists are not happy to see this and say things like "that's not science". (Incidentally, I tend to be skeptical of any discipline that has "science" in it's name. If you have to tell everyone you're a science, perhaps you're trying to hide something. It's like restaurants that serve "authentic Chinese cuisine".)
Now, I don't know a huge amount about data mining techniques. And it seems to me that it would be irresponsible to do the following:
- Take a data set from which one could generate a large number different hypotheses.
- Start picking hypotheses at random.
- When you find a hypothesis that is true at the 95% confidence level, say "Eureka!" and publish.
This is silly because at the 95% confidence level, one expects that you'd get a hit one time out of twenty purely by chance. I sincerely hope that if the people doing this stuff do things analogous to what I just said, they use some much higher confidence level.
Data miners also seem to draw a distinction between exploratory and confirmatory data mining. They might use a technique like the one I just said (although more sophisticated) to find hypotheses that are worth looking at and studying in more detail. This, of course, is something we all do every day -- we look around and when things are interesting we look at them more closely. We cannot look at everything very closely because then our brains would explode.
The trick here, I suppose, is knowing how to distinguish signal from noise. My favorite example of the moment is Example I.11 of Flajolet and Sedgewick's Analytic Combinatorics (link goes to 12 MB PDF of the book). They tell us that the pattern "combinatorics" is hidden in the text of Hamlet, which begins as follows:
Who's there?
Nay, answer me: stand, and unfold yourself.
Long live the king!
Bernardo?
He.
You come most carefully upon your hour.
'Tis now struck twelve; get thee to bed, Francisco.
For this relief much thanks: 'tis bitter cold,
And I am sick at heart.
Have you had quiet guard?
Not a mouse stirring.
Well, good night.
If you do meet Horatio and Marcellus,
At this point we look at the bolded letters, jump up and down and say that Shakespeare was trying to tell us something, despite the fact that Shakespeare had never heard the word "combinatorics". But Shakespeare is also a Yankees fan (see the italicized letters) and went to Harvard (see the underlined letters), so we ought to be suspicious. It turns out that Hamlet contains 1.63 x 1039 instances of the word "combinatorics", whereas a random sequence of letters chosen uniformly at random from the English alphabet and of the same length as Hamlet contains on average 6.96 x 1037 such sequences, and a random sequence of letters chosen at random with the same distribution as normal English text contains on average 1.71 x 1039 such sequences. So all we can conclude, it seems, is that maybe Shakespeare was writing in the same language that the word "combinatorics" is taken from. (One could try to compute the standard deviation associated with that 1.71 x 1039 figure, but it's silly because English text is not created by reaching into a Scrabble bag over and over again.) Of course there are less frivolous examples -- one of them is looking at whether certain patterns that appear in the human genome occur more or less often than you'd expect by chance.
Labels:
combinatorics,
data mining,
Flajolet,
Sedgewick,
Shakespeare
30 August 2007
profiles of interconnection networks
Flajolet and Sedgewick's Analytic Combinatorics (link goes to 12 MB PDF of the book, which is still in progress) is a most interesting book. I took a course based on the first three chapters of the book last fall; the purpose of the class was basically to teach us how to translate combinatorial structures into generating functions, in a fairly routine way. Unfortunately, we didn't get to the part of the text where we could then tell things about the asymptotics of those combinatorial structures, which is a tremendously useful thing to do.
So I've been working through parts of the book lately. In example V.8 Flajolet and sedgewick talk about a problem considered by Lagarias, Odlyzko and Zagier in their paper On The Capacity of Disjointly Shared Networks (although not quite in the same language). Flajolet and sedgewick ask: "There are 2n points on a line, with n point-to-point connections between pairs of points. What is the probable behavior of the width of such an interconnection network?" The width is defined in the following way: imagine the connections as circular arcs between the points on a horizontal line; let a vertical line sweep from left to right; then the width is the maximum number of arcs encountered by such a line.
For example, if we have n = 6 (and thus 12 points), we might make the connections 9-12, 5-10, 1-2, 6-8, 3-11, 4-7; as the line sweeps between 6 and 7 there are four open connections (and never are there more), so the width is 4. The "profile" of this network is the sequence given by the number of open connections just after the line sweeps past 0, past 1, past 2, and so on (call these a0, a1, a2, ...); this is the sequence
0, 1, 0, 1, 2, 3, 4, 3, 2, 3, 2, 1, 0
and in general this sequence increases to somewhere around n/2 and then decreases. Flajolet and sedgewick says that Louchard proved this, but I haven't looked at that paper. What surprised me, when I saw this problem, is that the profile actually looks something like a parabola, as you can see from the simulations at the top of p. 311 in Flajolet and sedgewick. Louchard proved that the profile "conforms asymptotically to a deterministic parabola 2nx(1-x)... to which are superimposed random fluctuations of O(√n)". So, for example, in such a network with one thousand nodes (n = 500), the profile goes as high as 250 (when x = 1/2) before going back down to zero.
Why should this be?
Imagine that you're putting together a random interconnection network. The profile sequence clearly has the property that each member ak is one more or one less than the last one. It'll be one more if k is the first member of one of the pairs, and one less if it's the second member. What's the probability that k is the first (i. e. smaller) member of its pair? If k is small, it'll be large; if k is near 2n then it'll be small. In fact, if we know k is a member of some pair, the other member of the pair is equally likely to be any integer from 1 to 2n except k; thus the probability that k is the smaller member of its pair is approximately 1 - k/2n, and the probability that it's the larger member is approximately k/2n.
Thus ak = ak-1 + 1 with probability 1 - k/2n, and ak = ak-1 - 1 the rest of the time. The expected value of ak - ak-1 is approximately (1-k/2n) - (k/2n), or 1-k/n. So, for example, one-fourth of the way through the process, when k = n/2, we expect ak to increase by about one-half each time we increase k by one.
Integrating with respect to n, we get
at ≈ ∫0t (1-k/n) dk = t - t2/2n
which is just a rescaling of Louchard's parabola. Flajolet and sedgewick's book is full of examples of things like this, where some apparently random structure turns out to have a lot of order when you "zoom out".
So I've been working through parts of the book lately. In example V.8 Flajolet and sedgewick talk about a problem considered by Lagarias, Odlyzko and Zagier in their paper On The Capacity of Disjointly Shared Networks (although not quite in the same language). Flajolet and sedgewick ask: "There are 2n points on a line, with n point-to-point connections between pairs of points. What is the probable behavior of the width of such an interconnection network?" The width is defined in the following way: imagine the connections as circular arcs between the points on a horizontal line; let a vertical line sweep from left to right; then the width is the maximum number of arcs encountered by such a line.
For example, if we have n = 6 (and thus 12 points), we might make the connections 9-12, 5-10, 1-2, 6-8, 3-11, 4-7; as the line sweeps between 6 and 7 there are four open connections (and never are there more), so the width is 4. The "profile" of this network is the sequence given by the number of open connections just after the line sweeps past 0, past 1, past 2, and so on (call these a0, a1, a2, ...); this is the sequence
0, 1, 0, 1, 2, 3, 4, 3, 2, 3, 2, 1, 0
and in general this sequence increases to somewhere around n/2 and then decreases. Flajolet and sedgewick says that Louchard proved this, but I haven't looked at that paper. What surprised me, when I saw this problem, is that the profile actually looks something like a parabola, as you can see from the simulations at the top of p. 311 in Flajolet and sedgewick. Louchard proved that the profile "conforms asymptotically to a deterministic parabola 2nx(1-x)... to which are superimposed random fluctuations of O(√n)". So, for example, in such a network with one thousand nodes (n = 500), the profile goes as high as 250 (when x = 1/2) before going back down to zero.
Why should this be?
Imagine that you're putting together a random interconnection network. The profile sequence clearly has the property that each member ak is one more or one less than the last one. It'll be one more if k is the first member of one of the pairs, and one less if it's the second member. What's the probability that k is the first (i. e. smaller) member of its pair? If k is small, it'll be large; if k is near 2n then it'll be small. In fact, if we know k is a member of some pair, the other member of the pair is equally likely to be any integer from 1 to 2n except k; thus the probability that k is the smaller member of its pair is approximately 1 - k/2n, and the probability that it's the larger member is approximately k/2n.
Thus ak = ak-1 + 1 with probability 1 - k/2n, and ak = ak-1 - 1 the rest of the time. The expected value of ak - ak-1 is approximately (1-k/2n) - (k/2n), or 1-k/n. So, for example, one-fourth of the way through the process, when k = n/2, we expect ak to increase by about one-half each time we increase k by one.
Integrating with respect to n, we get
at ≈ ∫0t (1-k/n) dk = t - t2/2n
which is just a rescaling of Louchard's parabola. Flajolet and sedgewick's book is full of examples of things like this, where some apparently random structure turns out to have a lot of order when you "zoom out".
Labels:
combinatorics,
Flajolet,
generating functions,
Lagarias,
Louchard,
networks,
Odlyzko,
Sedgewick,
Zagier
Subscribe to:
Comments (Atom)