30 June 2008
A tail bound for the normal distribution
Often one wants to know the probability that a random variable with the standard normal distribution takes value above x for some positive constant x.
(Okay, I'll be honest -- by "one" I mean "me", and the main reason I'm writing this post is to fix this idea in my head so I don't have to go looking for my copy of Durrett's text Probability: Theory and Examples every time I want this result. Durrett gives a much shorter proof -- two lines -- on page 6 of that book, but it involves an unmotivated-seeming change of variables, which is why I have trouble remembering it.)
The probability density function of the standard normal is ${1 \over \sqrt{2\pi}} \exp( -x^2/2),ドル and so the probability in question is
It's a standard fact, but one that I can never remember, that this is bounded above by ${1 \over \sqrt{2\pi} x} \exp(-x^2/2)$ (and furthermore bounded below by 1 - 1/x2 times the upper bound, so the upper bound's not a bad estimate).
How to prove this? Well, here's an idea -- approximate the tail of the standard normal distribution's density function by an exponential. Which exponential? The exponential of the linearization of the exponent at t. The exponent has negative second derivative, so the new exponent is larger (less negative) than the old one and this is an overestimate. That is,
where the new exponent is the linearization of -t2/2 at t=x.
Then pull out factors which don't depend on t to get and doing that last integral gives the desired bound.
Basically, the idea is that since the density to the right of x is dropping off as the exponential of a quadratic, most of it's concentrated very close to x, so we might as well approximate the density of the function by the exponential of a linear function, which is easier to work with.
By similar means one can show that the expectation of a real number selected from the standard normal distribution, given that it's greater than x, is something like x + 1/x. The tail to the right of x looks like an exponential random variable with mean 1/x. For example, the expectation of a real number selected from the standard normal distribution, conditioned on being larger than 10, is 10.09809.... But this is probably useless, because the probability of a real number selected from the standard normal distribution being larger than 10 is, by the previous bound, smaller than 1 in 10(2π)1/2e50, or about one in 1.3 x 1023.
(Okay, I'll be honest -- by "one" I mean "me", and the main reason I'm writing this post is to fix this idea in my head so I don't have to go looking for my copy of Durrett's text Probability: Theory and Examples every time I want this result. Durrett gives a much shorter proof -- two lines -- on page 6 of that book, but it involves an unmotivated-seeming change of variables, which is why I have trouble remembering it.)
The probability density function of the standard normal is ${1 \over \sqrt{2\pi}} \exp( -x^2/2),ドル and so the probability in question is
It's a standard fact, but one that I can never remember, that this is bounded above by ${1 \over \sqrt{2\pi} x} \exp(-x^2/2)$ (and furthermore bounded below by 1 - 1/x2 times the upper bound, so the upper bound's not a bad estimate).
How to prove this? Well, here's an idea -- approximate the tail of the standard normal distribution's density function by an exponential. Which exponential? The exponential of the linearization of the exponent at t. The exponent has negative second derivative, so the new exponent is larger (less negative) than the old one and this is an overestimate. That is,
where the new exponent is the linearization of -t2/2 at t=x.
Then pull out factors which don't depend on t to get and doing that last integral gives the desired bound.
Basically, the idea is that since the density to the right of x is dropping off as the exponential of a quadratic, most of it's concentrated very close to x, so we might as well approximate the density of the function by the exponential of a linear function, which is easier to work with.
By similar means one can show that the expectation of a real number selected from the standard normal distribution, given that it's greater than x, is something like x + 1/x. The tail to the right of x looks like an exponential random variable with mean 1/x. For example, the expectation of a real number selected from the standard normal distribution, conditioned on being larger than 10, is 10.09809.... But this is probably useless, because the probability of a real number selected from the standard normal distribution being larger than 10 is, by the previous bound, smaller than 1 in 10(2π)1/2e50, or about one in 1.3 x 1023.
28 June 2008
Baseball bats
Oh, apparently baseball people use "length-to-weight ratio" to describe a bat, as I learned from the people who talk too much before the Saturday afternoon game on Fox today. This is calculated by taking the weight (in ounces) minus the length (in inches), and in the major leagues can't be less than -3.5.
Of course, it's actually a difference, not a ratio.
It looks like some people call it the "differential", though, which is fine with me -- to me "differential" has other connotations, but expecting mathematical terminology not to collide with terminology used in other things is a Bad Idea. (Although why not just call it the "difference"?)
Of course, it's actually a difference, not a ratio.
It looks like some people call it the "differential", though, which is fine with me -- to me "differential" has other connotations, but expecting mathematical terminology not to collide with terminology used in other things is a Bad Idea. (Although why not just call it the "difference"?)
A thing I'm tired of hearing
"X does especially well/badly in interleague play."
Small sample size, people.
Small sample size, people.
27 June 2008
A variant of the traveling salesman problem
Josh Robbins attempts to see a baseball game in all thirty major league baseball parks in 26 days.
Yes, you read that right.
And Major League Baseball doesn't make that easy. As you can guess, he has to see two games in one day four times -- but in markets with two teams (New York, Chicago, San Francisco/Oakland, Los Angeles) they try to schedule the two teams to be on the home at opposite times. That makes sense, because that way if you think "I want to see a baseball game today" you've got a good chance.
In fact, his four doubleheaders are Dodgers-Padres (which apparently was a bit of a tight squeeze, since the Dodgers went into extra innings), Yankees-Mets (that one should be easy; every few years the Mets and the Yankees play a game at one park in the afternoon and at the other park the same night; they're doing it today); Phillies-Nationals (which will be tight even if the games go the ordinary length; they start six hours apart, average game length is three hours or so, and the cities are two and a half hours apart with no traffic -- oh, and he's doing it on a Thursday); Cubs-Brewers.
My point is that 25 days might be possible -- but probably not. Most baseball games are scheduled for around 1 PM or around 7 PM, and games last three hours, to see two in one day requires the sites to be no more than three hours apart. The pairs that are doable in one day are probably:
Mets-Yankees, Mets-Phillies, Yankees-Phillies
Phillies-Orioles, Phillies-Nationals, Orioles-Nationals
Dodgers-Padres, Padres-Angels, Angels-Dodgers
White Sox-Cubs, White Sox-Brewers, Cubs-Brewers
Giants-A's
but of course one can do only one from each row, so it's only possible to double up on five days. Basically, this is the problem of looking for the largest matching in the graph that I defined above, where the edges are teams within about three hours' driving distance of each other.
(Oddly enough, each two-team market (and yes, I know, Baltimore and Washington may or may not be the same market) seems to have another team a couple hours away. In two cases that team is the Phillies. As you may know, this blog likes the Phillies.)
So 25 is theoretically possible, if the Scheduling Gods worked in one's favor -- but I'd be scared to even look at the schedules to try and figure it out. And what happens if there's a rainout?
As a problem in actually scheduling things, the other tricky part is that Denver really isn't near any other team. And Robbins' schedule had him at a 7:05 game in San Diego, followed by a 1:05 game in Denver the next day -- but Denver's a time zone to the east of San Diego, so that's seventeen hours between starts. Fourteen hours driving time. For 1,078 miles.
For some other variants of the traveling salesman problem which involve the road network, see Barry Stiefel's 50 states in a week's vacation (driving, with flights to Alaska and Hawaii) and 21 states in one day. The last one cheats a bit -- it's a 26-hour day, since he started in the Eastern time zone during daylight savings time (GMT-4), and did the trip on a day when we went back to standard time (GMT-5) and then crossed into the Central time zone (GMT-6). The difference here is that you only have to enter each state instead of reaching a point.
Oh, and I feel obliged to point out that I find the meme of going on a long road trip this summer because "this is the last summer it'll ever be possible" kind of stupid. (Not that anybody here brought it up.)
Edited (Saturday morning): Google Maps says Cleveland to Detroit can be driven in 2:46. I didn't realize they were that close together. They'd be even closer if someone built a bridge across Lake Erie.
(Saturday afternoon): Cleveland to Pittsburgh in 2:18. I'll admit the reason I forgot this one is that mentally I think of Pittsburgh as being in the same state as me and Cleveland as not being in it, so they must be far apart. This is despite the fact that I live about five miles from New Jersey.
Anyway, you could shave off yet another day by combining the Indians with either the Pirates or the Tigers.
Yes, you read that right.
And Major League Baseball doesn't make that easy. As you can guess, he has to see two games in one day four times -- but in markets with two teams (New York, Chicago, San Francisco/Oakland, Los Angeles) they try to schedule the two teams to be on the home at opposite times. That makes sense, because that way if you think "I want to see a baseball game today" you've got a good chance.
In fact, his four doubleheaders are Dodgers-Padres (which apparently was a bit of a tight squeeze, since the Dodgers went into extra innings), Yankees-Mets (that one should be easy; every few years the Mets and the Yankees play a game at one park in the afternoon and at the other park the same night; they're doing it today); Phillies-Nationals (which will be tight even if the games go the ordinary length; they start six hours apart, average game length is three hours or so, and the cities are two and a half hours apart with no traffic -- oh, and he's doing it on a Thursday); Cubs-Brewers.
My point is that 25 days might be possible -- but probably not. Most baseball games are scheduled for around 1 PM or around 7 PM, and games last three hours, to see two in one day requires the sites to be no more than three hours apart. The pairs that are doable in one day are probably:
Mets-Yankees, Mets-Phillies, Yankees-Phillies
Phillies-Orioles, Phillies-Nationals, Orioles-Nationals
Dodgers-Padres, Padres-Angels, Angels-Dodgers
White Sox-Cubs, White Sox-Brewers, Cubs-Brewers
Giants-A's
but of course one can do only one from each row, so it's only possible to double up on five days. Basically, this is the problem of looking for the largest matching in the graph that I defined above, where the edges are teams within about three hours' driving distance of each other.
(Oddly enough, each two-team market (and yes, I know, Baltimore and Washington may or may not be the same market) seems to have another team a couple hours away. In two cases that team is the Phillies. As you may know, this blog likes the Phillies.)
So 25 is theoretically possible, if the Scheduling Gods worked in one's favor -- but I'd be scared to even look at the schedules to try and figure it out. And what happens if there's a rainout?
As a problem in actually scheduling things, the other tricky part is that Denver really isn't near any other team. And Robbins' schedule had him at a 7:05 game in San Diego, followed by a 1:05 game in Denver the next day -- but Denver's a time zone to the east of San Diego, so that's seventeen hours between starts. Fourteen hours driving time. For 1,078 miles.
For some other variants of the traveling salesman problem which involve the road network, see Barry Stiefel's 50 states in a week's vacation (driving, with flights to Alaska and Hawaii) and 21 states in one day. The last one cheats a bit -- it's a 26-hour day, since he started in the Eastern time zone during daylight savings time (GMT-4), and did the trip on a day when we went back to standard time (GMT-5) and then crossed into the Central time zone (GMT-6). The difference here is that you only have to enter each state instead of reaching a point.
Oh, and I feel obliged to point out that I find the meme of going on a long road trip this summer because "this is the last summer it'll ever be possible" kind of stupid. (Not that anybody here brought it up.)
Edited (Saturday morning): Google Maps says Cleveland to Detroit can be driven in 2:46. I didn't realize they were that close together. They'd be even closer if someone built a bridge across Lake Erie.
(Saturday afternoon): Cleveland to Pittsburgh in 2:18. I'll admit the reason I forgot this one is that mentally I think of Pittsburgh as being in the same state as me and Cleveland as not being in it, so they must be far apart. This is despite the fact that I live about five miles from New Jersey.
Anyway, you could shave off yet another day by combining the Indians with either the Pirates or the Tigers.
Labels:
baseball,
optimization,
transportation,
travel
Mathifying the oil bubble
House passes bill to reverse oil price increases.
Why am I writing about this? Because of the following sentence, which opens the article (emphasis added):
By the way, The 2006-2008 Oil Bubble and Beyond (D. Sornette, R. Woodard, W.-X. Zhou; arXiv:0806.1170) claim that the growth is actually super-exponential, which they claim is diagnostic of a bubble. (Via the physics arXiv blog.) So "parabolic" may be a massive understatement.
Why am I writing about this? Because of the following sentence, which opens the article (emphasis added):
After the close of the markets Thursday, as the fear of a continued parabolic rise in the price of oil was still fresh on the minds of investors, the U.S. House of Representatives approved a bill that that could help to reverse the direction of oil prices.I don't know, the runup looks linear to me, at least in the short term -- although everything looks linear in the short term. That doesn't mean it hasn't been fast. This is mathification at work.
By the way, The 2006-2008 Oil Bubble and Beyond (D. Sornette, R. Woodard, W.-X. Zhou; arXiv:0806.1170) claim that the growth is actually super-exponential, which they claim is diagnostic of a bubble. (Via the physics arXiv blog.) So "parabolic" may be a massive understatement.
25 June 2008
White People hate math, but like statistics
Did you know White People hate math, but like statistics?
This is from Stuff White People Like. There are three things you should know about SWPL, if you don't already. First, it's SATIRICAL. Second, it is not actually about "white people" (i. e. people whose ancestors originally hail from Europe) but "White People". These are best defined as people who like things on this list, like irony, Netflix, Wes Anderson movies, indie music, having two last names, Oscar parties, having black friends, indie music, The Wire, and the idea of soccer. (This is actually a randomly chosen sample from the list, which is conveniently numbered; random.org gave me 41 twice, and indie music is #41, hence the duplication. I was going to just pick a few things at "random", but I realized that I was kind of biased towards the things that I like.)
By "statistics" is meant not the mathematical field but various interesting-sounding numbers. For example, if each White Person has a favorite thing from the list of Stuff White People Like, and you pick ten white people at random, there's a 36% chance that two of them will have the same favorite White Person Thing. (This is the White Person version of the birthday paradox.)
Also of interest there: the entry on graduate school, which I think pretty clearly refers to grad school in the humanities.
This is from Stuff White People Like. There are three things you should know about SWPL, if you don't already. First, it's SATIRICAL. Second, it is not actually about "white people" (i. e. people whose ancestors originally hail from Europe) but "White People". These are best defined as people who like things on this list, like irony, Netflix, Wes Anderson movies, indie music, having two last names, Oscar parties, having black friends, indie music, The Wire, and the idea of soccer. (This is actually a randomly chosen sample from the list, which is conveniently numbered; random.org gave me 41 twice, and indie music is #41, hence the duplication. I was going to just pick a few things at "random", but I realized that I was kind of biased towards the things that I like.)
By "statistics" is meant not the mathematical field but various interesting-sounding numbers. For example, if each White Person has a favorite thing from the list of Stuff White People Like, and you pick ten white people at random, there's a 36% chance that two of them will have the same favorite White Person Thing. (This is the White Person version of the birthday paradox.)
Also of interest there: the entry on graduate school, which I think pretty clearly refers to grad school in the humanities.
24 June 2008
Subscribe to:
Comments (Atom)