Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts
05 February 2011
Computing distance in the Facebook graph
Is there some nice algorithm for computing the distance between two nodes in a graph, that gracefully fails when they're far apart? I'm asking this prompted by this metafilter question on how to compute the distance between two individuals in the Facebook graph; it seems to me that if someone is at distance 5 versus 6 from me, for example, I don't really care which it is.
Dijkstra's algorithm, for example, runs in time proportional to the number of edges of the graph. (Well, O(E+V log V), but I'm guessing that log V is smaller than E/V.) This is the search algorithm I learned in an algorithms class once. And most algorithms that I've seen -- of course this isn't an area I know a huge amount about -- tend to run in time at least as large as the number of edges of the graph. (Of course this makes sense because one could have to actually look at all the edges of the graph -- such are the pitfalls of worst-case analysis.)
It seems like bidirectional search would work -- if my friends of friends overlap with your friends of friends, then we're at a distance of most 4, for example. (I didn't realize this had a name until just now; it just seems like an obvious idea to me.) But is there some reason I'm overlooking that this wouldn't be practical?
Dijkstra's algorithm, for example, runs in time proportional to the number of edges of the graph. (Well, O(E+V log V), but I'm guessing that log V is smaller than E/V.) This is the search algorithm I learned in an algorithms class once. And most algorithms that I've seen -- of course this isn't an area I know a huge amount about -- tend to run in time at least as large as the number of edges of the graph. (Of course this makes sense because one could have to actually look at all the edges of the graph -- such are the pitfalls of worst-case analysis.)
It seems like bidirectional search would work -- if my friends of friends overlap with your friends of friends, then we're at a distance of most 4, for example. (I didn't realize this had a name until just now; it just seems like an obvious idea to me.) But is there some reason I'm overlooking that this wouldn't be practical?
02 September 2009
The hidden mathematics of bathrooms
From the xkcd blog: urinal protocol vulnerability.
The basic premise here is the following: there's a long row of urinals (n of them), and a line of men who want to use them. The first man picks a urinal at the end. Each man after that picks one of the urinals which is the furthest from any of the occupied urinals. Nobody ever leaves. How many men have to show up before one of them will be forced into using a urinal adjacent to one that's already occupied? Call this number f(n).
You might think that f(n)/n approaches some limit, but it doesn't; it oscillates between 1/3 and 1/2 based on the fractional part of log2 n. If n = 2k + 1 then this "greedy" algorithm for filling the urinals works and every other urinal gets filled: f(2k + 1) = 2k-1 + 1. If n = 3 x 2k-1 + 1 then the worst possible thing happens and only every third urinal gets filled, and f(3 x 2k-1 + 1) = 2k-1 + 1. (Yes, that's the same number, and the function's constant in between.) f(5) = f(6) = f(7) = 3, f(9) = ... = f(13) = 5, and so on.) Oscillations like this -- periodic in the logarithm of the problem size -- happen a lot in problems involving binary trees counted by the number of nodes. Still, it was a bit surprising to see this, because I'd never thought about the problem in the case of "unphysically" large n.
Exercise for the reader: invent a mathematically equivalent version of this problem that doesn't involve urinals.
The basic premise here is the following: there's a long row of urinals (n of them), and a line of men who want to use them. The first man picks a urinal at the end. Each man after that picks one of the urinals which is the furthest from any of the occupied urinals. Nobody ever leaves. How many men have to show up before one of them will be forced into using a urinal adjacent to one that's already occupied? Call this number f(n).
You might think that f(n)/n approaches some limit, but it doesn't; it oscillates between 1/3 and 1/2 based on the fractional part of log2 n. If n = 2k + 1 then this "greedy" algorithm for filling the urinals works and every other urinal gets filled: f(2k + 1) = 2k-1 + 1. If n = 3 x 2k-1 + 1 then the worst possible thing happens and only every third urinal gets filled, and f(3 x 2k-1 + 1) = 2k-1 + 1. (Yes, that's the same number, and the function's constant in between.) f(5) = f(6) = f(7) = 3, f(9) = ... = f(13) = 5, and so on.) Oscillations like this -- periodic in the logarithm of the problem size -- happen a lot in problems involving binary trees counted by the number of nodes. Still, it was a bit surprising to see this, because I'd never thought about the problem in the case of "unphysically" large n.
Exercise for the reader: invent a mathematically equivalent version of this problem that doesn't involve urinals.
16 July 2009
"Roommates" is a euphemism?
I'm at the Cornell Probability Summer School. (This announcement is too late for anybody who wants to find me here, as it's almost over! But I have been tracked down by at least one fan.)
In a lecture here this morning, Ander Holroyd spoke about the stable marriage problem and variations of it involving point processes (see this paper of Holroyd, Pemantle, Peres, and Schramm, which I've mentioned before, for details). The goal of the problem is to pair up n men and n women in such a way that no two people who aren't married to each other prefer each other to their current partners; this is called a "stable matching" and one always exists.
The original paper on the stable marriage problem is that of Gale and Shapley, in 1962. This paper also talks about the "stable roommates" problem, which is the analogous problem where everybody is of the same gender. Rather surprisingly, I never realized that "roommates" might be a euphemism here, which is something that Holroyd pointed out this morning to quite a bit of laughter.
In a lecture here this morning, Ander Holroyd spoke about the stable marriage problem and variations of it involving point processes (see this paper of Holroyd, Pemantle, Peres, and Schramm, which I've mentioned before, for details). The goal of the problem is to pair up n men and n women in such a way that no two people who aren't married to each other prefer each other to their current partners; this is called a "stable matching" and one always exists.
The original paper on the stable marriage problem is that of Gale and Shapley, in 1962. This paper also talks about the "stable roommates" problem, which is the analogous problem where everybody is of the same gender. Rather surprisingly, I never realized that "roommates" might be a euphemism here, which is something that Holroyd pointed out this morning to quite a bit of laughter.
10 July 2008
Three beautiful quicksorts
Jon Bentley gives a lecture called Three Beautiful Quicksorts, as three possible answers to the question "what's the most beautiful code you've ever written?" (An hour long, but hey, I've got time to kill.)
Watch the middle third, in which some standard code for quicksort is gradually transformed into code for performing an analysis of the number of comparisons needed in quicksort, and vanishes in a puff of mathematical smoke.
Although I must admit, I'm kind of annoyed that he slips into the idea that an average-case analysis is the most important thing somewhere in there. The first moment of a distribution is not everything you need to know about it! Although I admit that at times I subscribe to the school of thought that says "the first two moments are everything", but that's only because most distributions of normal.
(Note to those who don't get sarcasm: I don't actually believe that most distributions are normal.)
Watch the middle third, in which some standard code for quicksort is gradually transformed into code for performing an analysis of the number of comparisons needed in quicksort, and vanishes in a puff of mathematical smoke.
Although I must admit, I'm kind of annoyed that he slips into the idea that an average-case analysis is the most important thing somewhere in there. The first moment of a distribution is not everything you need to know about it! Although I admit that at times I subscribe to the school of thought that says "the first two moments are everything", but that's only because most distributions of normal.
(Note to those who don't get sarcasm: I don't actually believe that most distributions are normal.)
22 March 2008
When are two algorithms the same?
When Are Two Algorithms The Same?, by Andreas Blass, Nachum Dershowitz, and Yuri Gurevich. Via Lambda the Ultimate and Anarchaia.
My first thought here -- and this is unusual for me, since I'm basically a discrete mathematician -- is that maybe the space of programs is "effectively continuous", in the same sense, that, say, reality is "effectively continuous" but actually has a discrete nature if you look closely enough, thanks to quantum physics. So the right thing to do would be to topologize the set of programs, so you can say that algorithms X and X' are "close to" each other, or perhaps to put a metric on it. But I naturally imagine that, say, if two programs differ by some small number of small changes then they would have to be considered the "same program" if we were looking for an equivalence relation. Imagine a graph whose vertices are programs, and two programs are connected by an edge if they differ by some small number of small changes; then we're naturally led to think of "algorithms" (if they are equivalence classes of programs) as connected components of that graph. But how do we decide what number of changes to allow, or how large they can be? Basically, the relation on the set of programs "expresses the same algorithm" doesn't seem to be transitive, as the authors point out, so "equivalence relation" is the wrong idea.
By the way, I speculate that calling the space of programs a metric space (say, with the metric induced from the graph I alluded to before) is wrong as well, but in a different way -- how does one assign lengths to the edges? That seems a bit subjective and squishy. Topology's the way to go here, although it's not like anybody's holding their breath waiting for an answer to this question. And it sort of feels like a question about moduli spaces (link goes to a post by Jordan Ellenberg), although I know next to nothing about those.
This reminds me (and the authors) of the question asked in Tim Gowers' post when are two proofs essentially the same. (In fact, is this the same question?)
People usually regard algorithms as more abstract than the programs that implement them. The natural way to formalize this idea is that algorithms are equivalence classes of programs with respect to a suitable equivalence relation. We argue that no such equivalence relation exists.
My first thought here -- and this is unusual for me, since I'm basically a discrete mathematician -- is that maybe the space of programs is "effectively continuous", in the same sense, that, say, reality is "effectively continuous" but actually has a discrete nature if you look closely enough, thanks to quantum physics. So the right thing to do would be to topologize the set of programs, so you can say that algorithms X and X' are "close to" each other, or perhaps to put a metric on it. But I naturally imagine that, say, if two programs differ by some small number of small changes then they would have to be considered the "same program" if we were looking for an equivalence relation. Imagine a graph whose vertices are programs, and two programs are connected by an edge if they differ by some small number of small changes; then we're naturally led to think of "algorithms" (if they are equivalence classes of programs) as connected components of that graph. But how do we decide what number of changes to allow, or how large they can be? Basically, the relation on the set of programs "expresses the same algorithm" doesn't seem to be transitive, as the authors point out, so "equivalence relation" is the wrong idea.
By the way, I speculate that calling the space of programs a metric space (say, with the metric induced from the graph I alluded to before) is wrong as well, but in a different way -- how does one assign lengths to the edges? That seems a bit subjective and squishy. Topology's the way to go here, although it's not like anybody's holding their breath waiting for an answer to this question. And it sort of feels like a question about moduli spaces (link goes to a post by Jordan Ellenberg), although I know next to nothing about those.
This reminds me (and the authors) of the question asked in Tim Gowers' post when are two proofs essentially the same. (In fact, is this the same question?)
03 March 2008
Breadth-first versus depth-first browsing
Firefox's tabbed browsing feature encourages breadth-first search. If I click on a link the tab containing the page linked to appears as the rightmost tab, and I generally work through tabs from left to right. Breadth-first search can be implemented in this way -- we maintain a list of pages to be looked at, and the first page to enter the queue is also the first page to leave it.
Depth-first browsing wouldn't be too much different on a cosmetic level -- it could be set up by having the new tab appear immediately after the current tab, giving a stack of pages to view instead of a queue. I suspect the subjective experience of Internet browsing would feel much different from such a point of view -- browsing often seems to lead to shallow knowledge. (If one had time to search the entire Internet breadth-first search and depth-first search would eventually visit the same set of pages -- but who has that kind of time?) The "optimal" algorithm for finding particular information isn't strictly breadth-first or depth-first, though; if you think about how you search when you look for a specific piece of information, you don't routinely follow the leftmost tab or the rightmost tab, but instead click on whatever tab subjectively seems like it would give the best information.
Depth-first browsing wouldn't be too much different on a cosmetic level -- it could be set up by having the new tab appear immediately after the current tab, giving a stack of pages to view instead of a queue. I suspect the subjective experience of Internet browsing would feel much different from such a point of view -- browsing often seems to lead to shallow knowledge. (If one had time to search the entire Internet breadth-first search and depth-first search would eventually visit the same set of pages -- but who has that kind of time?) The "optimal" algorithm for finding particular information isn't strictly breadth-first or depth-first, though; if you think about how you search when you look for a specific piece of information, you don't routinely follow the leftmost tab or the rightmost tab, but instead click on whatever tab subjectively seems like it would give the best information.
28 January 2008
Primes is in P
It appears to not be widely known, at least among some people who would like to know it, that primality testing can be done in polynomial time. The "naive" methods for determining whether a number is prime -- trial division and the like -- take time which is polynomial in the size of the input (the polynomial may be n1/2, but it's still a polynomial) which is considered "large" because it's exponential in the number of digits or bits in the input.
The fact that primality testing can be done in polynomial time -- and without recourse to a randomized algorithm -- I've heard about here and there; not surprisingly, the exponent is large. Namely, to test n for primality takes time O~(log6 n), where O~(t(n)) = O(t(n) * P(log t(n))) and P is a polynomial. (If you don't like the O~ notation, you can say O(log6+ε n) for any positive real ε.) It's conjectured that "6" can be replaced by "3". The proof is based on a generalization of Fermat's Little Theorem; of course it's nothing like the "naive" method of trial division, but the paper is surprisingly accessible given the strength of the result.
The paper is Manindra Agrawal, Neeraj Kayal, Nitin Saxena. PRIMES is in P. Annals of Mathematics, 160 (2004), 781-793. (http://www.math.princeton.edu/~annals/issues/2004/Sept2004/Agrawal.pdf)
Edited, 7:21 pm: It's been pointed out to me that this result is widely known among people who work in the appropriate fields, and that's true. But a lot of people who are just looking for some sort of color when they're giving an example of a hard algorithmic problem, and whose specialty is not in that area, either don't know this or don't remember it. I'm claiming that the result deserves to be more well-known among mathematicians in general.
The fact that primality testing can be done in polynomial time -- and without recourse to a randomized algorithm -- I've heard about here and there; not surprisingly, the exponent is large. Namely, to test n for primality takes time O~(log6 n), where O~(t(n)) = O(t(n) * P(log t(n))) and P is a polynomial. (If you don't like the O~ notation, you can say O(log6+ε n) for any positive real ε.) It's conjectured that "6" can be replaced by "3". The proof is based on a generalization of Fermat's Little Theorem; of course it's nothing like the "naive" method of trial division, but the paper is surprisingly accessible given the strength of the result.
The paper is Manindra Agrawal, Neeraj Kayal, Nitin Saxena. PRIMES is in P. Annals of Mathematics, 160 (2004), 781-793. (http://www.math.princeton.edu/~annals/issues/2004/Sept2004/Agrawal.pdf)
Edited, 7:21 pm: It's been pointed out to me that this result is widely known among people who work in the appropriate fields, and that's true. But a lot of people who are just looking for some sort of color when they're giving an example of a hard algorithmic problem, and whose specialty is not in that area, either don't know this or don't remember it. I'm claiming that the result deserves to be more well-known among mathematicians in general.
25 January 2008
The stable marriage problem
The first problem considered in Algorithm Design, by Jon Kleinberg and Éva Tardos, is the stable marriage problem. (I think this is a fairly standard first problem in an algorithms text, but I may be wrong.) The problem is as follows: let there be n men and n women. Let us say that each of the men can rank the women in order of how much he would like to be married to them, and the women have a similar ordering for the men. We would like to pair these people off in marriage in such a way that we can't find a man and a woman who both prefer each other to the person that they're currently married to. (It's a bit unclear who "we" are, at least in this version of the problem; most of the real-world examples I can think of where everybody gets matched at the same time are more like what the Wikipedia article calls the "college admissions" or "hospitals/residents" problem.) Anyway, the procedure works as follows:
All individuals start out in the state "free". The state "engaged" is also possible.
Pick a man arbitrarily. He proposes to the women who is highest on his preference list that he has not already proposed to. If she is free, or if she prefers the man who just proposed to the man she's currently with, she accepts the proposal and they become "engaged". (If she just broke off an engagement, then the man in question becomes "free" again.) Repeat until everybody's married.
Now, it turns out that which men we choose arbitrarily at each juncture when we're allowed to do so doesn't matter. Regardless of the choices, each man ends up with the best woman that he possibly could among all stable matchings; each woman ends up with the worst.
Although this is clearly a toy version of the real-life marriage problem, this has an interesting implication; historically, men did the proposing. Now, women can propose as well (although anecdotal evidence suggests men do most of the proposing). This is probably good for women, bad for men.
There's also an interesting combinatorial question hidden here, which I haven't thought about -- there are (n!)2n possible sets of preference lists when we have n men and n women. Some of these sets of preferences will have more stable matchings than others; what is the largest possible number of stable matchings? How is it distributed? Is the number of stable matchings corresponding to some set of preference lists even something one could easily count?
One final comment: I've seen the stable marriage problem a few times before, and usually a mention is worked in of something called the "same-sex stable marriage problem" -- often called the "stable roommates problem" -- in which everybody has a preference list for everybody else. This could just as well be called the bisexual stable marriage problem, but this usage doesn't seem to be as common.
Edit (Saturday, 12:15 pm): stable matchings aren't unique. But as I recently learned from Alexander Holroyd, Robin Pemantle, Yuval Peres, Oded Schramm. "Poisson Matching" (arXiv: 0712.1867), they are unique when the preferences are symmetric in a certain sense. One example of symmetric preferences -- the one considered in that paper -- is the case when the men and women are associated with points in Euclidean space, and each prefers to be matched with the people of the opposite gender who are closest in distance. (When they first claimed that stable matchings were unique, I was tempted to just stop reading -- but I didn't, and I'm glad.)
All individuals start out in the state "free". The state "engaged" is also possible.
Pick a man arbitrarily. He proposes to the women who is highest on his preference list that he has not already proposed to. If she is free, or if she prefers the man who just proposed to the man she's currently with, she accepts the proposal and they become "engaged". (If she just broke off an engagement, then the man in question becomes "free" again.) Repeat until everybody's married.
Now, it turns out that which men we choose arbitrarily at each juncture when we're allowed to do so doesn't matter. Regardless of the choices, each man ends up with the best woman that he possibly could among all stable matchings; each woman ends up with the worst.
Although this is clearly a toy version of the real-life marriage problem, this has an interesting implication; historically, men did the proposing. Now, women can propose as well (although anecdotal evidence suggests men do most of the proposing). This is probably good for women, bad for men.
There's also an interesting combinatorial question hidden here, which I haven't thought about -- there are (n!)2n possible sets of preference lists when we have n men and n women. Some of these sets of preferences will have more stable matchings than others; what is the largest possible number of stable matchings? How is it distributed? Is the number of stable matchings corresponding to some set of preference lists even something one could easily count?
One final comment: I've seen the stable marriage problem a few times before, and usually a mention is worked in of something called the "same-sex stable marriage problem" -- often called the "stable roommates problem" -- in which everybody has a preference list for everybody else. This could just as well be called the bisexual stable marriage problem, but this usage doesn't seem to be as common.
Edit (Saturday, 12:15 pm): stable matchings aren't unique. But as I recently learned from Alexander Holroyd, Robin Pemantle, Yuval Peres, Oded Schramm. "Poisson Matching" (arXiv: 0712.1867), they are unique when the preferences are symmetric in a certain sense. One example of symmetric preferences -- the one considered in that paper -- is the case when the men and women are associated with points in Euclidean space, and each prefers to be matched with the people of the opposite gender who are closest in distance. (When they first claimed that stable matchings were unique, I was tempted to just stop reading -- but I didn't, and I'm glad.)
17 January 2008
How fast is the Euclidean algorithm?
From Wikipedia: a plot of the running time of the Euclidean algorithm to find gcd(x,y). Stare at the plot for a while; it has some fascinating fine structure.
What can be said about the average number of iterations needed to run the Euclidean algorithm on (x, y), where x and y are, say, selected u. a. r.1 from {1, 2, ..., N}? It clearly can't grow any faster than logarithmically, since the Euclidean algorithm operated on (x, y) takes at most logφ max(x,y) + O(1) steps (when its inputs are consecutive Fibonacci numbers). It turns out that the mean number of iterations required on inputs which are at most N is in fact C log N for some constant C; this is apparently a classical (1969) result, and a special case of a more general result of Viviane Baladi and Brigitte Vallee in their paper Euclidean algorithms are Gaussian (arXiv:0307062v4). So the Fibonacci numbers, which are the worst possible input in this algorithm, are only a constant factor slower than the average. Baladi and Vallee also show that the number of iterations is in fact asymptotically Gaussian with variance also of order log N.
1 "u. a. r." stands for "uniformly at random". This seems to be a semi-standard abbreviation.
What can be said about the average number of iterations needed to run the Euclidean algorithm on (x, y), where x and y are, say, selected u. a. r.1 from {1, 2, ..., N}? It clearly can't grow any faster than logarithmically, since the Euclidean algorithm operated on (x, y) takes at most logφ max(x,y) + O(1) steps (when its inputs are consecutive Fibonacci numbers). It turns out that the mean number of iterations required on inputs which are at most N is in fact C log N for some constant C; this is apparently a classical (1969) result, and a special case of a more general result of Viviane Baladi and Brigitte Vallee in their paper Euclidean algorithms are Gaussian (arXiv:0307062v4). So the Fibonacci numbers, which are the worst possible input in this algorithm, are only a constant factor slower than the average. Baladi and Vallee also show that the number of iterations is in fact asymptotically Gaussian with variance also of order log N.
1 "u. a. r." stands for "uniformly at random". This seems to be a semi-standard abbreviation.
31 October 2007
Avoid premature optimization
A lot of people who write about algorithms say that you should avoid prematurely optimizing an algorithm until you've analyzed it carefully enough that you know which parts of the algorithm are the parts that slow it down.
There are two ways to do this. The first is, of course, to go through some sort of formal analysis of the algorithm, until you notice something like "hey, I'm evaluating lots of polynomials, shouldn't I think about how I could evaluate them quickly?" And then you'd remember that polynomials can be written in a form like
where the left-hand side requires three multiplications (generally, n multiplications for a polynomial of degree n) and three (generally, n) additions to be evaluated; the right-hand side requirese six (generally, n(n+1)/2 = n + (n-1) + ... + 1) multiplications and three (generally, n) additions. (This is, of course, assuming that you've forgotten what x2 is when you get to evaluating x3. You wouldn't... but a computer probably would!) Generally,
But sometimes it's easier to just step through the algorithm step by step and realize what you find yourself doing over and over again. For example, in evaluating a polynomial anxn + an-1xn-1 + ... + a1x + a0, you'd realize that when you compute xn you start by computing xn-1 and then you multiply it by x; if you did that by hand over and over again you'd eventually want to find a way to speed things up. Wouldn't it be easier if you had already stored the value of xn-1? So you'd first compute the list [1, x, x^2, ... , x^n] by just noticing that each term is x times the previous one, and then you'd multiply those by their respective coefficients ai and add those sums up. This takes 2n multiplications and n additions; it's not quite as good as the form I've given above, but the point is that just storing your intermediate results instead of recalculating them over and over again saves a lot of time. And if you don't realize this from formally analyzing the algorithm, you can realize it by just stepping through the algorithm until you get to the point where you start cursing and thinking "didn't I already do this? Why the *^%)!*!# am I doing it again?"
(By the way, this was inspired by my trying to compute the numbers of factorizations of various integers; it turns out that you want a lot of numbers of the form "the number of factorizations of n into factors at most k" and if you're not careful you can end up computing the same numbers over and over again. So I got careful.)
There are two ways to do this. The first is, of course, to go through some sort of formal analysis of the algorithm, until you notice something like "hey, I'm evaluating lots of polynomials, shouldn't I think about how I could evaluate them quickly?" And then you'd remember that polynomials can be written in a form like
((ax+b)x+c)x+d = ax^3 + bx^2 + cx + d
where the left-hand side requires three multiplications (generally, n multiplications for a polynomial of degree n) and three (generally, n) additions to be evaluated; the right-hand side requirese six (generally, n(n+1)/2 = n + (n-1) + ... + 1) multiplications and three (generally, n) additions. (This is, of course, assuming that you've forgotten what x2 is when you get to evaluating x3. You wouldn't... but a computer probably would!) Generally,
But sometimes it's easier to just step through the algorithm step by step and realize what you find yourself doing over and over again. For example, in evaluating a polynomial anxn + an-1xn-1 + ... + a1x + a0, you'd realize that when you compute xn you start by computing xn-1 and then you multiply it by x; if you did that by hand over and over again you'd eventually want to find a way to speed things up. Wouldn't it be easier if you had already stored the value of xn-1? So you'd first compute the list [1, x, x^2, ... , x^n] by just noticing that each term is x times the previous one, and then you'd multiply those by their respective coefficients ai and add those sums up. This takes 2n multiplications and n additions; it's not quite as good as the form I've given above, but the point is that just storing your intermediate results instead of recalculating them over and over again saves a lot of time. And if you don't realize this from formally analyzing the algorithm, you can realize it by just stepping through the algorithm until you get to the point where you start cursing and thinking "didn't I already do this? Why the *^%)!*!# am I doing it again?"
(By the way, this was inspired by my trying to compute the numbers of factorizations of various integers; it turns out that you want a lot of numbers of the form "the number of factorizations of n into factors at most k" and if you're not careful you can end up computing the same numbers over and over again. So I got careful.)
28 October 2007
Sorting with error
Here are three problems that have occurred to me, somewhat randomly, over the last few weeks.
1. I mentioned here that it would probably be relatively easy to rank, say, all the open problems in analysis by their perceived difficulty to expert analysts, and all the open problems in algebra by their perceived difficulty to expert algebraists. But how could we merge these two rankings together, given that comparing an algebra problem and an analysis problem is probably much more difficult and error-prone than comparing two algebra problems or two analysis problems? For example, comparing an arbitrary algebra problem and an arbitrary analysis requires finding someone who is fluent enough in the relevant areas to be able to really understand the problems! (If you object to the fact that "algebra" and "analysis" are too broad for this to be true, just subdivide the areas; the idea is still the same. We can rank problems within suitably small subdisciplines of mathematics; how do we merge those rankings into a ranking of all mathematical problems?)
2. Most people are attracted either to men or to women, not to both. (I'm assuming this for the sake of this problem.) It seems reasonably easy to rank-order the set of all men in order of attractiveness, and the set of all women in the same way; but again, how do you merge these two? How do you compare the attractiveness of a man to the attractiveness of a woman? Oddly enough, I think this one's easy to solve because of "assortative mating", which basically means "people will end up with people who are about as attractive as themselves". (Incidentally, although this seems obvious to anyone who's ever been around people, how can it be derived from first principles?) But the information we get from assortative mating is likely to be imperfect, again; there are plenty of relationships where one member of the couple is more attractive than the other.
3. Major League Baseball has two leagues. We have a mechanism for determining the "best" team in each league. The common wisdom now is that the American League is the better of the two leagues; look at this year's World Series, which the Red Sox are currently leading three games to none. (But don't look at last year's.) But that doesn't necessarily mean that, say, the eighth-best team in the AL is better than the eighth-best team in the NL. There's interleague play, but most of the time that's only a single three-game series between any pair of teams, which really doesn't give much information. (And most pairs of teams in different leagues won't play each other at all in any given season.) Let's assume, for the sake of argument, that we have to use just which teams have won and lost against each other, and maybe the scores of the games; we can't use the fact that teams are made up of players, and compare the AL and NL statistics of a player who has played in both leagues.
The common thread is that in all three cases we have a pair of lists which we believe are each correctly rank-ordered, but comparison between the lists is difficult, or we're doing our rankings based on already collected data and we can't go out and get more, and sometimes our comparisons might be wrong (for example, in the baseball scenario, the team which wins the season series between two teams is not necessarily better). All the analysis of sorting algorithms I've seen (which is, I'll admit, not that much) seems to assume that all comparisons are equally easy and there are no errors. That's a nice place to start, but it hardly seems a place to stop.
1. I mentioned here that it would probably be relatively easy to rank, say, all the open problems in analysis by their perceived difficulty to expert analysts, and all the open problems in algebra by their perceived difficulty to expert algebraists. But how could we merge these two rankings together, given that comparing an algebra problem and an analysis problem is probably much more difficult and error-prone than comparing two algebra problems or two analysis problems? For example, comparing an arbitrary algebra problem and an arbitrary analysis requires finding someone who is fluent enough in the relevant areas to be able to really understand the problems! (If you object to the fact that "algebra" and "analysis" are too broad for this to be true, just subdivide the areas; the idea is still the same. We can rank problems within suitably small subdisciplines of mathematics; how do we merge those rankings into a ranking of all mathematical problems?)
2. Most people are attracted either to men or to women, not to both. (I'm assuming this for the sake of this problem.) It seems reasonably easy to rank-order the set of all men in order of attractiveness, and the set of all women in the same way; but again, how do you merge these two? How do you compare the attractiveness of a man to the attractiveness of a woman? Oddly enough, I think this one's easy to solve because of "assortative mating", which basically means "people will end up with people who are about as attractive as themselves". (Incidentally, although this seems obvious to anyone who's ever been around people, how can it be derived from first principles?) But the information we get from assortative mating is likely to be imperfect, again; there are plenty of relationships where one member of the couple is more attractive than the other.
3. Major League Baseball has two leagues. We have a mechanism for determining the "best" team in each league. The common wisdom now is that the American League is the better of the two leagues; look at this year's World Series, which the Red Sox are currently leading three games to none. (But don't look at last year's.) But that doesn't necessarily mean that, say, the eighth-best team in the AL is better than the eighth-best team in the NL. There's interleague play, but most of the time that's only a single three-game series between any pair of teams, which really doesn't give much information. (And most pairs of teams in different leagues won't play each other at all in any given season.) Let's assume, for the sake of argument, that we have to use just which teams have won and lost against each other, and maybe the scores of the games; we can't use the fact that teams are made up of players, and compare the AL and NL statistics of a player who has played in both leagues.
The common thread is that in all three cases we have a pair of lists which we believe are each correctly rank-ordered, but comparison between the lists is difficult, or we're doing our rankings based on already collected data and we can't go out and get more, and sometimes our comparisons might be wrong (for example, in the baseball scenario, the team which wins the season series between two teams is not necessarily better). All the analysis of sorting algorithms I've seen (which is, I'll admit, not that much) seems to assume that all comparisons are equally easy and there are no errors. That's a nice place to start, but it hardly seems a place to stop.
19 October 2007
Fibonacci tattoo!
From Pink Haired Girl, who I apparently share several mutual friends with: a tattoo to generate the Fibonacci sequence, in Scheme.
There are faster algorithms to generated the Fibonacci sequence, but they obscure the recursive definition; her tattoo follows the simple recursive definition Fn = Fn-1 + Fn-2, which takes O(n) arithmetic operations to compute Fn. Wikipedia's article on Fibonacci numbers gives an identity
F2n+k = Fk Fn+12 + 2Fk-1Fn+1Fn + Fk-2Fn2
Now, note that we get F-2 = -1 , F-1 = 1 , F0 = 0 by running the recursion backwards. Letting k=1 gives
F2n+1 = (2Fn+1 - Fn) Fn
and letting k=2 gives
F2n+2 = Fn+12 + 2Fn+1Fn.
Letting m+1 = n, we get
F2m = Fm2 + 2FmFm-1
= Fm(Fm + 2Fm-1)
= Fm(Fm+1 + Fm-1).
Thus we can express Fn in terms of Fibonacci numbers of indices around n/2, and thus find Fibonacci numbers in logarithmic time... but that obscures the central fact about these numbers, which is the recursion that produces them. (There are probably other ways to arrange those identities that give more efficient algorithms than the one I'm alluding to, but I didn't try that hard.) There are also identities for Fn in terms of Fibonacci numbers of index around n/3, for example -- see Wikipedia -- which could be even faster.
There are faster algorithms to generated the Fibonacci sequence, but they obscure the recursive definition; her tattoo follows the simple recursive definition Fn = Fn-1 + Fn-2, which takes O(n) arithmetic operations to compute Fn. Wikipedia's article on Fibonacci numbers gives an identity
F2n+k = Fk Fn+12 + 2Fk-1Fn+1Fn + Fk-2Fn2
Now, note that we get F-2 = -1 , F-1 = 1 , F0 = 0 by running the recursion backwards. Letting k=1 gives
F2n+1 = (2Fn+1 - Fn) Fn
and letting k=2 gives
F2n+2 = Fn+12 + 2Fn+1Fn.
Letting m+1 = n, we get
F2m = Fm2 + 2FmFm-1
= Fm(Fm + 2Fm-1)
= Fm(Fm+1 + Fm-1).
Thus we can express Fn in terms of Fibonacci numbers of indices around n/2, and thus find Fibonacci numbers in logarithmic time... but that obscures the central fact about these numbers, which is the recursion that produces them. (There are probably other ways to arrange those identities that give more efficient algorithms than the one I'm alluding to, but I didn't try that hard.) There are also identities for Fn in terms of Fibonacci numbers of index around n/3, for example -- see Wikipedia -- which could be even faster.
Subscribe to:
Comments (Atom)