God Plays Dice: permutations

Showing posts with label permutations. Show all posts

31 December 2009

A hack I'm disturbingly proud of, and its connection to some real math

I'm applying for jobs. Many jobs, because that's how academic job searches work these days. So I have a spreadsheet (in OpenOffice) to keep track of them.

Among the things that I track for each job, there is a column with 0, 1, or 2 in it. 0 means that I haven't submitted anything; 1 means I've submitted something, but not everything that was asked for; 2 means the application is complete. Averaging these numbers and dividing by 2 tells me what proportion of the search is complete.

But I also wanted to know how many 0s, 1s, and 2s there were. And as far as I know the built-in functions in OpenOffice won't do that.

What they will do, however, is this. I have a column consisting of 0s, 1s, 2s, and empty cells. By doing

COUNT(J8:J1000)
SUM(J8:J1000)
SUMPRODUCT(J8:J1000;J8:J1000)

I get the number of cells in that column which are nonempty; their sum; and the sum of their squares. (The SUMPRODUCT function takes 2 arrays of the same shape and returns the sum of the products of corresponding cells.) "8" is the row that contains the first job on the list, and "1000" is just a number that is comfortably more than the number of jobs I am applying for. Call these a, b, and c respectively. Let n₀, n₁, and n₂ be the number of entries which are 0, 1, and 2 respectively. Then I have

a = n₀ + n₁ + n₂
b = n₁ + 2n₂
c = n₁ + 4n₂

which is a three-by-three linear system, and can be solved for n₀, n₁, n₂, giving

n₀ = a - 3b/2 + c/2, n₁ = 2b-c, n₂ = (c-b)/2

and so I can recover the number of applications with status 0, 1, or 2 from this. From the sums of the 0th, 1st, and 2nd powers I can recover the distribution of the values themselves. (The actual code is slightly different, but of course equivalent, because I solved the system "by inspection" and never actually explicitly wrote it out until just now.)

Believe it or not, I actually use this trick in a preprint, "The number of cycles of specified normalized length in permutations", to do some actual mathematics! There I find the expectation of X⁰, X¹, X², ..., X^k where X is a certain random variable known to take on the values 0, 1, ..., k, namely the number of cycles of length in the interval [γ n, δ n] in a permutation of [n] chosen uniformly at random where γ and δ are constants. k is the greatest integer less than or equal to 1/γ ; for example, if we're looking at cycles of length at least 0.15n in permutations of n, there can't be more than six of them. This gives a linear system like the one above which gives the probability that X takes on each value 0, 1, ..., k.

Posted by Michael Lugo at 12:55 PM 6 comments:

Labels: combinatorics, linear algebra, permutations, probability

10 February 2008

Splitting permutations of [n] into two classes

The number of permutations of [n] with an even number of cycles and with an odd numbers of cycles are equal, for all n ≥ 2. (I assume this is a standard result, just because anyone who's stared at the Stirling numbers for long enough would think of it, but I don't recall hearing it before.)

Here's a proof which relies on bivariate generating functions. Note that a permutation can be viewed a set of directed cycles. In the notation of combinatorial classes, we write this

\cal{P} = \hbox{Set}(\mu \hbox{ Cyc}(\cal{Z}))

Now, there are (n-1)! cycles on an n-element set (order the n elements in n! ways, but then divide by n for the rotation) giving the generating function

[画像:z + 1! {z^2 \over 2!} + 2! {z^3 \over 3!} + 3! {z^4 \over 4} + \cdots ]

which simplifiies to

[画像:z + {z^2 \over 2} + {z^3 \over 3} + {z^4 \over 4} + \cdots]

which is just log 1/(1-z).
The mark μ then becomes the variable u, and taking sets corresponds to exponentiation, so we have

P(z,u) = \exp( u \log (1-z)^{-1}) = (1-z)^{-u}

as an exponential generating function for permutations, where z marks size and u marks number of cycles. That is,

[画像:(1-z)^{-u} = \sum_{n,k} {1 \over n!} \left[ {n \atop k} \right] z^n u^k]

where $\left[ {n \atop k} \right]$ is the number of permutations of [n] with k cycles.

Finally, let u = -1. Then we get

[画像:1-z = \sum_{n,k} {1 \over n!} \left[ {n \atop k} \right] z^n (-1)^k]

and taking the coefficient of zⁿ on both sides for any n ≥ 2 and multiplying through by n!, we get

[画像:0 = \sum_{k} \left[ {n \atop k} \right] (-1)^k]

But on the right-hand side, we get a contribution of +1 for each permutation of [n] with an even number of cycles, and -1 for each permutation with an odd number of cycles. Thus those two sets are equal in cardinality.

I feel like this is a standard result, but I hadn't seen this proof before coming up with it myself. There's a similar proof that doesn't require the bivariate generating function machinery; the generating function of the numbers $\left[ {n \atop k} \right]$ for fixed n is

[画像:\sum_k \left[ {n \atop k} \right] z^k = z(z+1)(z+2) \ldots (z+n-1)]

and let z = -1 here. And I may have even seen a bijective proof of this fact, which gives a bijection between permutations of [n] with an odd number of cycles and an even number of cycles.

By the way, the analogous result that, say, one-third of permutations of [n] have a number of cycles divisible by 3, one-third have 3k+1 cycles for some k, and one-third have 3k+2 cycles for some k isn't true. (It should be approximately true, though, but that doesn't seem particularly interesting.)

Edited, 3:24 pm: See the comments for a bijective proof offered by Jeremy Henty; it really is quite simple but eluded my grasp this morning.

Posted by Michael Lugo at 9:32 AM 4 comments:

Labels: combinatorics, generating functions, permutations

02 February 2008

How many fixed points do involutions have?

It's a fairly well-known fact that if we pick a permutation of the set [n] at random, then the expected number of cycles of length k in that permutation is 1/k, for 1 ≤ k ≤ n. (Somewhat more memorably, the average number of elements in a permutation of [n] which are members of a k-cycle is 1.)

So what would you expect to happen if you just considered permutations have cycles of length 1 and 2 -- that is, involutions -- and sampled uniformly at random from them? If you're like me, your first thought is that the average involution will have twice as many 1-cycles as 2-cycles, and the same number of elements in 1-cycles as 2-cycles -- that is, the average involution on [n] will have n/2 1-cycles (i. e. fixed points) and n/4 2-cycles, for a total of n/2 fixed points and n/2 elements in 2-cycles.

But then you look at "n/2 fixed points" and think that that seems awfully large...

it turns out the average number of fixed points of an involution chosen uniformly at random from all involutions is about n^1/2. This follows from standard generating function arguments. The exponential generating function of the number of involutions marked for their number of fixed points is exp(uz+z²/2); that is, the coefficient of zⁿu^k in that function is n! times the number of involutions on [n] with k fixed points. Standard methods from, say, Flajolet and Sedgewick (which I will probably buy when it comes out in print later this year, because I seem to cite it constantly) gives that the expected number of fixed points is

[画像:{[z^n] z \exp \left( z + {z^2 \over 2} \right) \over [z^n] \exp \left( z + {z^2 \over 2} \right)}]

and this can actually be rewritten as na_n-1/a_n, where a_n is the number of involutions on [n], that is, [画像:$a_n = n! [z^n] \exp \left( z + {z^2 \over 2} \right)$]. (There's a nice interpretation for this -- a_n-1/a_n is the probability that any given element of an involution is actually a fixed point -- although it's hard to say exactly why this should be true.)

Then, if you're still like me, you think "alas, I have forgotten how to figure out the asymptotics of coefficients of entire functions". But the asymptotic number of involutions is the last example of Herbert Wilf's generatingfunctionology. After some work, the asymptotic formula he gives for a_n gives that the expected number of fixed points in an involution is n^1/2 - 1/2 + o(1)

Once you know that fixed points are rare, then it's not hard to guess that their distribution should be approximately Poisson, and thus variance should be of the same order of magnitude as the mean -- and the variance result turns out to be true. (I don't know about the Poisson result.) The variance is, I believe, n^1/2 - 1 + o(1), although this is only from numerical evidence. (The generating-function way to calculate the variance relies on the definition of the variance as the mean of the square minus the square of the mean; this means I need better asymptotics in order to verify this. The better asymptotics are certainly achievable, but they're not at my fingertips.)

The result is a bit surprising, though -- why does cutting out cycles of length 3 and greater so drastically change the relative numbers of 1-cycles and 2-cycles? But involutions make up a vanishingly small proportion of all permutations, and weird things can happen in these asymptotically negligible sets without the bulk of the population caring at all.

Posted by Michael Lugo at 12:08 PM 19 comments:

Labels: generating functions, permutations, probability

20 November 2007

Pattern avoidance

Here's a paper I found while Googling for something else a while ago: On the Stanley-Wilf conjecture for the number of permutations avoiding a given pattern, by Richard Arratia, from the Electronic Journal of Combinatorics, Volume 6(1) (1999), note N1.

Pattern avoidance in permutations is a beautiful little subject, about which I don't know that much -- but not that much is known. For example, see Bridget Tenner's remarkably short database of permutation pattern avoidance; okay, so the state of knowledge isn't quite as bad as this makes it look, but it's not that good either. (A good introduction to the area seems to be Miklos Bona's Combinatorics of Permutations, which has a couple chapters on the area. However, if you're at Penn I don't recommend Bona's book, because I have the library copy and I don't want to give it up.

Let p₁ p₂ ... p_n be a permutation of the integers from 1 to n. We say a permutation is, for example, 231-avoiding if there is no i < j < k such that p_k < p_i < p_j. That is, a 231-pattern in a permutation is a set of three letters in the word representing it which, when read from left to right, fall in the same order as the numbers 2, 3, 1; a permutation is 231-avoiding if it has no 231-patterns. So, for example, 26415837 is not a 231-avoiding permutation of [8], because 2, 6, 1 form a 231-pattern. A similar definition can be made of a σ-avoiding permutation for any permutation (called a "pattern") σ. Rather surprisingly, the number of patterns avoiding any permutation of length 3 is the same, and this can be proven bijectively. But some permutations of length 4, for example, are "easier" to avoid than others; what makes a pattern easy or difficult to avoid isn't totally clear. The Stanley-Wilf conjecture (now proven, by Gabor Tardos and Adam Marcus) states that if we let F(n, σ) be the number of permutations of [n] which avoid the pattern σ, then

\lim_{n \to \infty} F(n, \sigma)^{1/n}

exists and is finite. For example, 231-avoiding permutations are in bijection with Catalan trees of size n, so F(n, 231) is the nth Catalan number. Thus

[画像:F(n, 231) \sim {4^n \over \sqrt{\pi n^3}}]

and so we get the constant 4 for the limit above. The Stanley-Wilf conjecture thus associates a constant with each permutation, but not too many of these constants are known. (I won't attempt to tabulate the known constants; I suspect someone out there has already done it. At the very least, there are a fair number of results scattered throughout the applicable chapters of Bona's textbook.)

Anyway, a natural question to ask is "how many σ-patterns does a permutation of [n] have, on average?" This is fairly obviously

[画像:{1 \over k!} {n \choose k}]

since there are ${n \choose k}$possible sites for such patterns, and each site actually contains such a pattern with probability 1/k!; from playing around with a few small cases it looks like the variance of the number of σ-patterns is asymptotically c_σn^2k-1, where the constants c_σ don't appear to be anything nice in general. (The only way I know how to compute these variances basically comes down to considering all the ways in which two instances of the same pattern in the same permutation can intersect and enumerating a large number of cases; it's not surprising the results are ugly. I've only done it in a couple simple cases, and I don't want to quote the results here because I haven't had the patience to check my computations.) In the case of inversions, which are just 21-patterns, these are the classical results that the average number of inversions of an n-permutation is asymptotically n²/4, with variance n³/36. I have a faint hope that there might be some relation between the constant "1/36" there and the fact that the number of 21-avoiding permutations of n is just 1ⁿ (that is, 1 -- this is the simplest case of the Stanley-Wilf conjecture, that there is only one inversion-free permutation) but I don't actually know enough of these constants to see what's going on.

Anyway, what I want to bring attention to is the conjecture of Alon at the end of this note:
Conjecture (Alon): The threshold length t(k), for a random permutation to contain all k-permutations with substantial probability, has t(k) ~ k^2/4.
Why should this be true? The note by Arratia gives some idea -- it relates this problem to the longest common subsequence problem -- but I want an argument that stays purely within the pattern avoidance realm. I'm working on it.

Posted by Michael Lugo at 7:30 PM 1 comment:

Labels: Alon, Arratia, combinatorics, Marcos, pattern avoidance, permutations, Stanley, Tardus, Wilf

God Plays Dice