Sabermetric Research
Phil Birnbaum
Monday, November 18, 2019
Why you can't calculate aging trajectories with a standard regression
I found myself in a little Twitter discussion last week about using regression to analyze player aging. I argued that regression won't give you accurate results, and that the less elegant "delta method" is the better way to go.
Although I did a small example to try to make my point, Tango suggested I do a bigger simulation and a blog post. That's this.
(Some details if you want:
For the kind of regression we're talking about, each season of a career is an input row. Suppose Damaso Griffin created 2 WAR at age 23, 2.5 WAR at age 24, and 3 WAR at age 25. And Alfredo Garcia created 1, 1.5, and 1.5 WAR at age 24, 25, and 26. The file would look like:
2 23 Damaso Griffin
2.5 24 Damaso Griffin
3 25 Damaso Griffin
1 24 Alfredo Garcia
1.5 25 Alfredo Garcia
1.5 26 Alfredo Garcia
And so on, for all the players and ages you're analyzing. (The names are there so you can have dummy variables for individual player skills.)
You take that file and run a regression, and you hope to get a curve that's "representative" or an "average" or a "consolidation" of how those players truly aged.)
------
I simulated 200 player careers. I decided to use a quadratic (parabola), symmetric around peak age. I would have used just a linear regression, but I was worried that it might seem like the conclusions were the result of the model being too simple.
Mathematically, there are three parameters that define a parabola. For this application, they represent (a) peak age, (b) peak production (WAR), and (c) how steep or gentle the curve is.*
(*The equation is:
y = (x - peak age)^2 / -steepness + peak production.
"Steepness" is related to how fast the player ages: higher steepness is higher decay. Assuming a player has a job only when his WAR is positive, his career length can be computed as twice the square root of (peak WAR * steepness). So, if steepness is 2 and peak WAR is 4, that's a 5.7 year career. If steepness is 6 and peak WAR is 7, that's a 13-year career.
You can also represent a parabola as y = ax^2+bx+c, but it's harder to get your head around what the coefficients mean. They're both the same thing ... you can use basic algebra to convert one into the other.)
For each player, I randomly gave him parameters from these distributions: (a) peak age normally distributed with mean 27 and SD 2; (b) peak WAR with mean 4 and SD 2; and (c) steepness (mean 2, SD 5; but if the result was less than 1.5, I threw it out and picked a new one).
I arbitrarily decided to throw out any careers of length three years or fewer, which reduced the sample from 200 players to 187. Also, I assumed nobody plays before age 18, no matter how good he is. I don't think either of those decisions made a difference.
Here's the plot of all 187 aging curves on one graph:
The idea, now, is to consolidate the 187 curves into one representative curve. Intuitively, what are we expecting here? Probably, something like, the curve that belongs to the average player in the list.
The average random career turned out to be age 26.9, peak WAR 4.19, and steepness 5.36. Here's a curve that matches those parameters:
That seems like what we expect, when we ask a regression to find the best-fit curve. We want a "typical" aging trajectory. Eyeballing the graph, it does look pretty reasonable, although to my eye, it's just a bit small. Maybe half a year bigger left and right, and a bit higher? But close. Up to you ... feel free to draw on your monitor what you think it should look like.
But when I ran the regression ... well, what came out wasn't close to my guess, and probably not close to your guess either:
It's much, much gentler than it should be. Even if your gut told you something different than the black curve, there's no way your gut was thinking this. The regression came up with a 19-year career. A career that long happened only once in the entire 187-player sample. we expected "representative," but the regression gave us 99.5th percentile.
What happened?
It's the same old "selective sampling"/"survivorship bias" problem.
The simulation decided that when a player's curve scores below zero, those seasons aren't included. It makes sense to code the simulation that way, to match real life. If Jerry Remy had played five years longer than he did, what would his WAR be at age 36? We have no idea.
But, with this simulation, we have a God's-eye view of how negative every player would go. So, let's include that in the plot, down to -20:
See what's happening? The black curve is based on *all* the green data, both above and below zero, and it lands in the middle. The red curve is based only on the green data above zero, so it ignores all the green negatives at the extremes.
If you like, think of the green lines as magnets, pulling the lines towards them. The green magnets bottom-left and bottom-right pull the black curve down and make it steeper. But only the green magnets above zero affect the red line, so it's much less steep.
In fact, if you scroll back up to the other graph, the one that's above zero only, you'll see that at almost every vertical age, the red line bisects the green forest -- there's about as much green magnetism above the red line it there is below it.
In other words: survivorship bias is causing the difference.
------
What's really going on is the regression is just falling for the same classic fallacy we've been warning against for the past 30 years! It's comparing players active (above zero) at age 27 to players active (above zero) at age 35. And it doesn't find much difference. But that's because the two sets of players aren't the same.
One more thing to make the point clearer.
Let's suppose you find every player active last year at age 27, and average their performance (per 500PA, or whatever). And then you find every player active last year at age 35, and average their performance.
And you find there's not much difference. And you conclude, hey, players age gracefully! There's hardly any dropoff from age 27 to age 35!
Well, that's the fallacy saberists have been warning against for 30 years, right? The canonical (correct) explanation goes something like this:
"The problem with that logic is that it doesn't actually measure aging, because those two sets of players aren't the same. The players who are able to still be active at 35 are the superstars. The players who were able to be active at 27 are ... almost all of them. All this shows is that superstars at 35 are almost as good as the league average at 27. It doesn't actually tell us how players age."
Well, that logic is *exactly* what the regression is doing. It's calculating the average performance at every age, and drawing a parabola to join them.
Here's one last graph. I've included the "average at each age" line (blue) calculated from my random data. It's almost a perfect match to the (red) regression line.
------
Bottom line: all the aging regression does is commit the same classic fallacy we repeatedly warn about. It just winds up hiding it -- by complicating, formalizing, and blackboxing what's really going on.
Labels: aging, regression
posted by Phil Birnbaum @ 11/18/2019 06:12:00 PM 11 comments
Tuesday, February 14, 2012
Two new "Moneyball"-type possibilities
I'm usually doubtful that significant "Moneyball"-type inefficiencies still exist in sports. But, recently, two possibilities came up that got me wondering.
First, in a discussion about baseball player aging, commenter Guy suggested that there are lots of good young players kept in the minors when they're good enough to be playing full-time in the majors. He mentions Wade Boggs, whom the Red Sox held back in the early 80s in favor of Carney Lansford.
It's certainly a possibility, especially when you consider the Jeremy Lin story. Of course, baseball and hockey are different from basketball and football, because they have minor leagues in which players get to show their stuff. But, still.
Second, and even bigger, is something Gabriel Desjardins discovered.
For the past several seasons, the NHL has been keeping track of the player who draws a penalty -- that is, the victim who was fouled. Desjardins grabbed the information and tallied the numbers.
Most of the players near the top of the list are who you would expect -- Crosby, Ovechkin, and so on. But the runaway leader is Dustin Brown, of the Los Angeles Kings.
Over the past seven seasons, Brown drew 380 opposition penalties. Ovechkin was second, with 255; Ryan Smyth was twentieth, at 181.
That means the difference between first and second place was almost twice the difference between second and twentieth place. Dustin Brown is exceptionally good at getting his team a power play.
Desjardins writes,
"Incidentally, 380 non-coincidental penalties is worth roughly 33ドルM in 2012 dollars relative to the league average, and quite a bit more relative to replacement level. ... Dustin Brown has made roughly 15ドルM so far in his career, making him one of the biggest deals in the entire league."
Wow. If you had tried to convince me that you could find an official NHL stat that would uncover 33ドル million worth of hidden value, I wouldn't have believed you. But there it is.
posted by Phil Birnbaum @ 2/14/2012 11:18:00 PM 5 comments
Friday, July 30, 2010
Do younger brothers steal more bases than older brothers? Part IV -- Age vs. SB
A few weeks ago, I wrote a series of posts on an academic paper about siblings and stolen bases. That study claimed that when brothers play in the major leagues, the younger brothers are much more likely -- by an odds ratio of 10 -- to attempt steals more often than their older brothers.
Since then, the authors, Frank J. Sulloway and Richard L. Zweigenhaft, were kind enough to write me to clarify the parts of their methodology I didn't fully understand. They also disagreed with me on a few points, and, on one of those, they're absolutely right.
Previously, I had written,
"It's obvious: if the a player gets called up before his brother *even though he's younger*, he's probably a much better player. In addition, speed is a talent more suited to younger players. So when it comes to attempted steals, you'd expect younger brothers called up early to decimate their older brothers in the steals department."
Seems logical, right? It turns out, however, that it's not true.
For every retired (didn't play in 2009) batter in my database for which I have career SB and CS numbers, I calculated his age as of December 31 of the year he was called up. I expected that players called up very young, like 20 or 21, would have a much higher career steal attempt rate than players who were called up older, like 25 or 26.
Not so. Here are career steal rates for various debut ages, weighted by career length, expressed in (SB+CS) per 200 (H+BB):
17 -- 5.1
18 -- 6.9
19 - 11.3
20 - 14.4
21 - 13.6
22 - 14.0
23 - 14.7
24 - 15.5
25 - 13.4
26 - 14.0
27 - 16.5
28 - 10.2
29 - 11.0
30 - 13.5
31 - 10.3
32 - 17.4
33 - 13.0
34 -- 3.9
35 - 10.5
36 -- 6.1
37 -- 7.4
38 -- 5.3
39 -- 8.4
40 -- 0.0
41 - 14.0
It's pretty flat from 20 to 27 ... there is indeed a dropoff at 28, but few players make their debuts at age 28 or later.
Why does this happen? Isn't it true that young players are faster than old players? Perhaps what's happening is that players who arrive in the major leagues earlier also play longer, which means their extra early high-steal years are balanced out by their extra later low-steal years. I'm not sure that's right, but it's a strong possibility. In any case, my assumption was off the mark, applying as it does only to age 28 and up.
I could have figured out that was the case had I looked at Bill James' rookie study from the 1987 Baseball Abstract. Near the end of page 58, Bill gave a similar chart for hits and stolen bases (but on the total number, not the rate). And it looks like SBs decay not much more than hits or games played.
For instance, consider a 22 year old player compared to one who's 25. The 22-year-old, according to Bill, will wind up with 88 percent more base hits than the 25-year-old (623 divided by 331, on Bill's scale). For stolen bases, the corresponding increase is 84 percent (613 to 334). The two numbers are pretty much the same -- which means, since 1987, we've known that career SB rates don't have a lot to do with callup age.
----
Anyway, the "odds ratio of 10" finding in the sibling study was based on individual player-to-player comparisons. So, I decided to test those. Suppose you have two players, but one breaks in to the majors at a younger age than the other. What is the chance that the younger callup attempts steals at a higher rate for his career?
To figure that out, I took the 5,742 batters in the study, and compared each one of them to each of the others. I ignored pairs where both players were called up at the same age, and I ignored pairs with the same attempt rate (usually zero).
The results: younger players "won" the steal competition at a 52.9% rate, with a "W-L" record of 7,387,525 wins and 6,569,412 losses.
Young: 7387525-6569412 .529
However: that includes a lot of "cup of coffee" players. If I limit the comparisons to where both players got at least 500 AB for their careers, then, unexpectedly, the older guys actually win:
Young: 1950250-1965161 .498
The difference between those two lines comprises cases when one or both players had a very short career. When that happened, the young guys kicked butt, relatively speaking:
Young: 5437275-4604251 .541
These numbers are important because they represent exactly what the authors of the sibling study did -- compare players directly. My argument was that I believed the younger player would be the "winner" a lot higher than 52.9% of the time. That's not correct. So that part of my argument is wrong, and I appreciate Frank Sulloway and Richie Zweigenhaft pointing that out to me.
Does that mean I now agree with the study's finding that the odds of a player having a higher attempt rate than his brother are 10 times as large when he's a younger sibling? No, I don't. But it does mean that I need to refine my argument, which I will do in a future post.
Labels: aging, baseball, baserunning, psychology, siblings
posted by Phil Birnbaum @ 7/30/2010 11:02:00 PM 0 comments
Wednesday, March 31, 2010
Stumbling on Wins: Do coaches not understand how players age?
On page 118 of "Stumbling on Wins," authors David Berri and Martin Schmidt argue that NBA coaches don't understand how players age. That's because, according to Berri and Schmidt, coaches give players more and more minutes until age 28. But, they, report, player productivity actually peaks at age 24. Therefore,
"... the allocation of minutes suggests the age profile in basketball is not well understood by NBA coaches."
Geez, that doesn't follow at all.
First, I don't understand how the authors figure that minutes played peak at 28. If you look at actual minutes played by age, the peak appears to be earlier. These are minutes by age for the current 2009-10 season, on the day I'm writing this:
19: 1512
20: 10932
21: 38198
22: 37283
23: 52626
24: 52653
25: 47297
26: 34481
27: 43339
28: 29843
29: 48955
30: 37756
31: 27852
32: 14336
33: 20376
34: 11677
35: 12976
36: 5333
37: 4516
38: 0
39: 122
The curve appears to reach its high point at 23 and 24, then diminishes irregularly down to age 39. There are a couple of blips, notably at 29, but you certainly wouldn't put the minutes peak at anything other than 23-24.
So why do the authors say 28 is the peak? I'm not sure. In a footnote, they say the details can be found on their website , but there's nothing posted yet for that chapter (seven).
I suspect the issue is selective sampling. If you look at only players who had long careers, you could very well come up with a peak of 28. As has been discussed repeatedly here and at Tango's site in the context of baseball aging, when you look only at players with long careers, you're sampling only those who aged more gracefully then others. And so your peak will be biased high.
Also, a player with a long career is probably a full-time player for most of it. Suppose someone comes up at 23 and plays until 33. His first couple of seasons and last couple of seasons, he might be a part-time player; the middle seasons, he's full-time, with only minor variations in minutes. So his minutes curve looks like: low horizontal line, high horizontal line, low horizontal line. If you try to draw a smooth curve to that, it'll peak right in the middle, which, for our example, is age 28.
The idea is: there's only so much playing time you can give to a good player. You might give him 40 minutes a game at age 28, when he's still very, very, good ... but you can't give him 50 minutes a game when he's 24 and brilliant. So the curve is roughly flat in a good player's prime, and the off-years at the beginning and the end will artificially make it look like there's a peak in the middle.
Anyway, this is all speculation until Berri and Schmidt post the study.
The average minute in the above table occurs at age 26.6 -- below the 28 that Berri and Schmidt talk about, but above the 24 that they say it should be. It makes sense that it should be well above 24. A good player might still be in the league ten years after the peak, at age 34 -- but there's no way he'd be in the league ten years before the peak, at age 14. If a player can play when he's old, but not when he's young, that, obviously, will skew the mean above the peak of 23-24.
There are probably other reasons, too, but I think that's the main one.
Berri and Schmidt think that NBA minutes peak later than 24 because coaches don't understand how players age. It seems obvious that there's a more plausible explanation -- that it's because players like Shaquille O'Neal are able to play NBA basketball at age 37, but not at age 9.
Labels: aging, basketball, Stumbling on Wins
posted by Phil Birnbaum @ 3/31/2010 10:13:00 PM 10 comments
Tuesday, January 19, 2010
Evaluating scientific debates: some ramblings
Last week's renewed debate on JC Bradbury's aging study (JC posted a new article to Baseball Prospectus, and comments followed there and on "The Book" blog) got me thinking about some things that are tangential to the study itself ... and since I have nothing else to write about at the moment, I thought I'd dump some of those random thoughts here.
1. Peer review works much better after publication than before.
When there's a debate between academics and non-academics, some observers argue that the academics are more likely to be correct, because their work was peer reviewed, while the critics' work was not.
I think it's the other way around. I think post-publication reaction, even informally on the internet, is a much better way to evaluate the paper than academic peer review.
Why? Because academic peer reviewers hear only one side of the question -- the author's. At best, they might have access to the comments of a couple of other referees. That's not enough.
After publication, on the internet, there's a back and forth between people on one side of the question and people on the other. That's the best way to get at the truth -- to have a debate about it.
Peer review is like the police deciding there's enough evidence to lay charges. Post-publication debate is like two lawyers arguing the case before a jury. It's when all the evidence is heard, not just the evidence on one side.
More importantly, no peer reviewer has as good a mastery of previous work on a subject than the collective mastery of the public. I may be an OK peer reviewer, but you know who's a better peer reviewer? The combination of me, and Tango, and MGL, and Pizza Cutter, and tens of other informed sabermetricians, some of whom I might only meet through the informal peer review process of blog commenting.
If you took twelve random sabermetricians whom I respect, and they unanimously came to the verdict that paper X is flawed, I would be at least 99% sure they were right and the peer reviewer was wrong.
2. The scientific consensus matters if you're not a scientist.
It's a principle of the scientific method that only evidence and argument count -- the identity of the arguer is irrelevant.
Indeed, there's a fallacy called "argument from authority," where someone argues that a particular view must be correct because the person espousing it is an expert on the subject. That's wrong because even experts can be wrong, and even the expertest expert has to bow to logic and evidence.
But that's a formal principle that applies to situations where you're trying to judge an argument on its merits. Not all of us are in a position to be able to do that all the time, and it's a reasonable shortcut in everyday life to base your decision on the expertise of the arguer.
If my doctor tells me I have disease X, and the guy who cleans my office tells me he saw my file and he thinks I really have disease Y ... well, it's perfectly legitimate for me to dismiss what the office cleaner says, and trust my doctor.
It only becomes "argument from authority" where I assert that I am going to judge the arguments on their merits. Then, and only then, am I required to look seriously at the office cleaner's argument, without being prejudiced by the fact that he has zero medical training.
Indeed, we make decisions based on authority all the time. We have to. There are many claims that are widely accepted, but still have a following of people who believe the opposite. There are people who believe the government is covering up UFO visits. There are people who believe the world is flat. There are people who believe 9/11 was an inside job.
If you're like me, you don't believe 9/11 was an inside job. And, again, if you're like me, you can't actually refute the arguments of those who do believe it. Still, your disbelief is rational, and based solely on what other people have said and written, and your evaluations of their credibility.
Disbelieving solely because of experts is NOT the result of a fallacy. The fallacy only happens when you try to use the experts as evidence. Experts are a substitute for evidence.
You get your choice: experts or evidence. If you choose evidence, you can't cite the experts. If you choose experts, you can't claim to be impartially evaluating the evidence, at least that part of the evidence on which you're deferring to the experts.
The experts are your agents -- if you look to them, it's because you are trusting them to evaluate the evidence in your stead. You're saying, "you know, your UFO arguments are extraordinary and weird. They might be absolutely correct, because you might have extraordinary evidence that refutes everyone else. But I don't have the time or inclination to bother weighing the evidence. So I'm going to just defer to the scientists who *have* looked at the evidence and decided you're wrong. Work on convincing them, and maybe I'll follow."
The reason I bring this up is that, over at BPro, MGL made this comment:
"I think that this is JC against the world on this one. There is no one in his corner that I am aware of, at least that actually does any serious baseball work. And there are plenty of brilliant minds who thoroughly understand this issue who have spoken their piece. Either JC is a cockeyed genius and we (Colin, Brian, Tango, me, et. al.) are all idiots, or..."
Is that comment relevant, or is it a fallacious argument from authority? It depends. If you're planning on reading all the studies and comments, and reaching a conclusion based on that, then you should totally ignore it -- whether an argument is correct doesn't depend on how many people think it is.
But if you're just reading casually and trying to get an intuitive grip on who's right, then it's perfectly legitimate.
And that's how MGL meant it. What he's saying is something like: "I've explained why I think JC is wrong and I'm right. But if you don't want to wade through all that, and if you're basing your unscientific decision on which side seems more credible -- which happens 99% of the time that we read opposing opinions on a question of scientific fact -- be aware that the weight of expert opinion is on my side."
Put that way, it's not an appeal to authority. It's a true statement about the scientific consensus.
3. Simple methods are often more trustworthy than complex ones.
There are lots of studies out there that have found that the peak age for hitters in MLB is about 27. There is one study, JC Bradbury's, that shows a peak of 29.
But it seems to me that there is a perception, in some quarters, that because JC's study is more mathematically sophisticated than the others, it's therefore more trustworthy. I think the opposite: that the complicated methods JC used make his results *less* believable, not more.
I've written before about simpler methods, in the context of regression and linear weights. Basically, there are two different methods that have been used to calculate the coefficients for the linear weights formula. One involves doing a regression. Another involves looking at play-by-play data and doing simple arithmetic. The simple method actually works better.
More importantly, for the argument I'm making here, the simple method is easily comprehensible, even without stats classes. It can be explained in a few sentences to any baseball fan of reasonable intelligence. And if you're going to say you know a specific fact, like that a single is worth about .46 runs, it's always nicer to know *why* than to have to trust someone else, who used a mathematical technique you don't completely understand.
Another advantage of the simple technique is that, because so many more people understand it, its pros and cons are discovered early. A complex method can have problems that don't get found out until much later, if ever.
For instance, how much do hitters lose in batting skill between age 28 and age 35? Well, one way to find out is to average the performance of 28-year-olds, and compare it to the averaged performance of 29-year-olds, 30-year-olds, and so on, up to 35-year-olds. Pretty simple method, right, and easy to understand? If you do it, you'll find there's not much difference among the ages. You might conclude that players don't lose much between 28 and 35.
But there's an obvious flaw: the two groups don't comprise the same players. Only above-average hitters stay in the league at 35, so you're comparing good players at 35 to all players at 28. That's why they look similar: the average of a young Joe Morgan and a young Roy Howell looks similar to the average of an old Joe Morgan and a retired, zero-at-bat Roy Howell, even though they both Morgan and Howell each declined substantially in the intervening seven years.
Now that flaw ... it's easy to spot, and the reason it's easy to spot is that the method is simple enough to understand. It's also easy to explain, and the reason it's easy to explain is again that the method is simple enough to understand.
If I use the more complicated method of linear regression (and a not very complicated regression), and describe it mathematically, it looks something like this:
"I ran an ordinary least squares regression, using the model P(it) = ax(it) + b[x(it)^2] + e, where P(it) is the performance of player i at age t, x(it) is the age of player i at age t, and all player-seasons of less than 300 PA were omitted. The e is an error term, assumed iid normal with mean 0."
The flaw is actually the same as in the original, simpler case, the fact that the sample of players is different at each age. But it's harder to see the flaw way, isn't it? It's also harder to describe where the flaw resides -- there's no easy one-sentence explanation about Morgan and Howell like there was before.
So why would you trust the complicated method more than the simple one?
Now, I'm not saying that complexity is necessarily bad. A complex method might be more precise, and give you better results, assuming that there aren't any flaws. But, you still have to check for flaws. If the complex method gives you substantially different results (peak age 29) from the simple methods (peak age 27), that's a warning sign. And so you have to explain the difference. Something must be wrong, either with the complex method, or with all the simple methods. It's not enough to just explain why the complex method is right. You also have to explain why the simple methods, which came up with 27, came out so wrong.
In the absence of a convincing explanation, all you have are different methods, and no indication which is more reliable. In that case, why would you choose to trust the complicated method that you don't understand, but reject a simple methods that you *do* understand? The only reason for doing so is that you have more faith that whoever introduced the complicated method actually got everything right, the method and the calculations and the logic.
I don't think that's justified. My experience leads me to think that it's very, very risky to give that kind of blind trust without understanding the method pretty darn well.
Labels: aging, peer review
posted by Phil Birnbaum @ 1/19/2010 01:38:00 AM 14 comments
Thursday, December 10, 2009
The Bradbury aging study, re-explained (Part III)
Last week, J.C. Bradbury posted a response to my previous posts on his aging study.
Before I reply, I should say that I found a small error in my attempt to reproduce Bradbury’s regression. The conclusions are unaffected. Details are in small print below, if you're interested. If not, skip on by.
As it turns out, when I was computing the hitters age to include in the regression, I accidentally switched the month and year. (Apparently, that wasn’t a problem when the reverse date was invalid – Visual Basic was smart enough to figure out that when I said 20/5 instead of 5/20, I meant the 20th day of May and not the 5th day of Schmidtember. But when the reverse date was valid – 2/3 instead of 3/2 -- it used the incorrect date.)
That means that some ages were wrong, and some seasons from 24-year-olds were left out of my study. I reran a corrected regression, and the results were very, very similar – all three peak ages I’ve recalculated so far were within .08 years of the original. So the conclusions still hold. If you’re interested in the (slightly) revised numbers, let me know and I’ll post them when I’m done rerunning everything.
Okay, now to Bradbury’s criticisms. I’ll concentrate on the most important ones, since a lot of this stuff has been discussed already.
----
First, there’s one point on which I agree with Bradbury’s critique. He writes,
" ... the model, as he defines it, is impossible to estimate. He cannot have done what he claims to have done. Including the mean career performance and player dummies creates linear dependence as a player’s career performance does not change over time, which means separate coefficients cannot be calculated for both the dummies and career performance. ... Something is going on here, but I’m not sure what it is."
He’s right: having both the player dummies and the career mean causes collinearity, which I eliminated by getting rid of one of the player dummies. I agree with him that the results aren’t meaningful this way. I should have eliminated the mean and gone with the dummies alone.
In any case, it doesn’t matter much: the results are similar with and without the dummies. The reason I used the dummies is that it made the results make more sense, and more consistent with what Bradbury found. It turns out that without the dummies, some of the aging curves were very, very flat. By including the dummies, the curves were closer to what Bradbury found.
In retrospect, the reason the curves make more sense with the larger model is that the dummies have the effect of eliminating any observation of only one season (since the dummy will come out to have that player match whatever curve best fits the other, more-than-one-season, players).
Regardless, the peak age is similar either way. But Bradbury’s point is well-taken.
----
Secondly, Bradbury disagrees with me that players are weighted by the number of seasons they played:
"His belief is based on a misunderstanding of how least-squares generates the estimates to calculate the peak. There is no average calculated from each player, and especially not from counting multiple observations for players who play more."
It’s possible I’m misunderstanding something, but I don’t think I am. The model specifies one row in the regression for each player-season that qualifies (player with a certain number of PA and seasons). If player A has a 12-year career that peaks at 30, and player B has a 6-year career that peaks at 27, then player A’s trajectory is represented by 12 rows in the regression matrix, and player B’s trajectory by 5 rows.
Bradbury would argue that the scenario above would result in a peak around 28.5 (the average of the two players). I would argue that the peak would be around 28 (player A weighted twice as heavily as player B). I suppose I could do a little experiment to check that, but that’s how it seems to me.
----
Thirdly, Bradbury says I misunderstood that he used rate statistics for home runs, not actual numbers of home runs:
"I’m estimating home-run rates, not raw home runs. All other stats are estimated as rates except linear weights. This is stated in the paper."
Right, that’s true, but that wasn’t my point. I was probably unclear in my original.
What I was trying to say was: the model assumes that all players improve and decline at the same fixed HR rate, regardless of where they started.
So, suppose Bradbury’s equation says that players drop by .01 home run per PA (or AB) the year after age X. (That’s 6 HR per 600 PA.) That equation does NOT depend on how good a home run hitter that player was before. That is: it predicts that Barry Bonds will drop by 6 HR per 600PA, but, also, Juan Pierre will drop by 6 HR per 600PA.
As I pointed out, that doesn’t really make sense, because Juan Pierre never hit 6 HR per 600PA in the first place, much less late in his career! The model thus predicts that he will drop to a *negative* home run rate.
I continue to argue that while the curve might make sense for the *composite* player in Bradbury’s sample, it doesn’t make sense for non-average players like Bonds or Pierre. That might be lost on readers who look at Bradbury’s chart and see the decline from aging expressed as a *percentage* of the peak, rather than a subtraction from the peak.
-----
Finally, and most importantly, one of Bradbury’s examples illustrates my main criticism of the method. Bradbury cites Marcus Giles. Giles’s best seasons were at age 25 to 27, but he declined steeply and was out of the league by 30. Bradbury:
"What caused Giles to decline? Maybe he had some good luck early on, maybe his performance-enhancing drugs were taken away, or possibly several bizarre injuries took their toll on his body. It’s not really relevant, but I think of Giles’s career as quite odd, and I imagine that many players who play between 3,000 — 5,000 plate appearances (or less) have similar declines in their performances that cause them to leave the league. I’ve never heard anyone argue that what happened to Giles was aging."
Bradbury’s argument is a bit of a circular one. It goes something like:
-- The regression method shows a peak age of 29.
-- Marcus Giles didn’t peak at 29 – indeed, he was out of the league at 29.
-- Therefore, his decline couldn’t have been due to aging!
I don’t understand why Bradbury would assume that Giles’ decline wasn’t due to aging. If the decline came at, say, 35 instead of 28, there would be no reason to suspect injuries or PEDs as the cause of the decline. So why couldn’t Giles just be an early ager? Why can’t different players age at different rates? Why is a peak age of 25, instead of 29, so implausible that you don’t include it in the study?
It’s like ... suppose you want to find the average age when a person gets so old they have to go to a nursing home. And suppose you look only at people who were still alive at age 100. Well, obviously, they’re going to have gone to a nursing home late in life, right? Hardly anyone is sick enough to need a nursing home at 60, but then healthy enough to survive in the nursing home for 40 years. So you might find that the average 100-year-old went into a nursing home at 93.
But that way of looking at it doesn't make sense: you and I both know that the average person who goes into a nursing home is a lot younger than 93.
But what Bradbury is saying is, "well, those people who went into a nursing home at age 65 and died at 70 ... they must have been very ill to need a nursing home at 65. So they’re not relevant to my study, because they didn’t go in because of aging – they went in because of illness. And I’m not studying illness, I’m studying aging."
That one difference between us is pretty much my main argument against the findings of the study. I say that if you omit players like Giles, who peaked early, then *of course* you’re going to come up with a higher peak age!
Bradbury, on the other hand, thinks that if you include players like Giles, you’re biasing the sample too low, because it’s obvious that players who come and go young aren’t actually showing "aging" as he defines it. But, first, I don’t think it’s obvious, and, second, if you do that, you’re no longer able to use your results to predict the future of a 26-year-old player. Because, after all, he could turn out to be a Marcus Giles, and your study ignores that possibility!
All you can tell a GM is, "well, if the guy turns out not to be a Marcus Giles, and he doesn’t lose his skill at age 31 or 33 or 34, and he turns out to play in the major leagues until age 35, you’ll find, in retrospect, that he was at his peak at age 29." That’s something, but ... so what?
I’m certainly willing to agree that if you look at players who were still "alive" in MLB at age 35, and played for at least 10 years, then, in retrospect, those players peaked at around 29. And I think Bradbury’s method does indeed show that. But if you look at *all* players, not just the ones who aged most gracefully, you’ll find the peak is a lot lower. There are a lot of people in nursing homes at age 70, even if Bradbury doesn't consider it’s because of "aging."
posted by Phil Birnbaum @ 12/10/2009 07:29:00 PM 0 comments
Thursday, November 26, 2009
The Bradbury aging study, re-explained (Part II)
This is a follow-up to my previous post on J.C. Bradbury's aging study ... check out that previous post first if you haven't already.
My argument was that players with shorter careers should peak earlier than players with longer careers. Bradbury disagreed. He reran his study with a lower minimum, 1000 PA instead of 5000. He found that there was "no drop".
I decided to try to run his study myself, the part where he looks at batter performance in Linear Weights. I think my results are close enough to his that they can be trusted. Skip the details unless you're really interested. I'll put them in a quote box so you can ignore them if you choose.
-----
Technical details:
Here's what I did. I took all players whose careers began in 1921 or later, and looked at their stats until the end of 2008 (even if they were still active). They had to have had a plate appearance in each of at least ten separate seasons. In seasons in which their age was 24 to 35 (as of July 1), they had to have had at least 5000 plate appearances.
Any player who did not meet the above criteria was not included in the regression. Also, the regression included only seasons from age 24-35 in which the player had at least 300 PA.
Each of those seasons was a row in the regression. The model I used was:
Z-score this season = a * age this season + b * age^2 this season + c * career average Z-score + d * player dummy + constant + error term
I didn't include dummy variables for individual seasons (Bradbury's "D" term, if you look at his paper) or park factors. I think those would change the results only slightly.
Another difference I noticed later is that when I calculated the Z-scores, I used the standard deviation only of players who were 24-35 and had 300 PA. Bradbury, I believe, used the SD of all players, regardless of PA. Again, I don't think that affects the results much (although it makes his coefficients about twice as big as mine).
Finally, I'm not 100% sure that I did exactly what Bradbury did in other respects. The study is vague about the details of the selection criteria. For instance, I'm not sure if any ten seasons qualified a player, or only ten seasons of only 300 PA. I'm not sure if the player need 300 PA every season between 24 and 35, or if that didn't matter as long as the total was over 5000. So I guessed. Also, for Linear Weights, I used a version that adjusts the out for the specific season, whereas Bradbury used -0.25 for all seasons (and compensated somewhat by having a dummy variable for league/season).
-----
Anyway, here is my best-fit equation, followed by Bradbury's:
Mine: Z = 0.760 * age - 0.0133 * age^2 - 0.901 * mean - 10.6802 + dummies
J.C.: Z = 1.322 * age - 0.0224 * age^2 - 1.205 * mean + other stuff + dummies
These equations look different, but that's mostly because Bradbury used a different definition of the Z-score. If you look at the significance levels, they're similar: for mine, about 12 SDs; for Bradbury, about 11 SDs. Bradbury might be smaller because his regression was more sophisticated, with certain corrections that likely brought the significance down.
More importantly, our estimates of peak age, which can be calculated as - ( coeff for age ) / ( 2 * coeff for age^2 ):
Mine: 28.62 peak age
J.C.: 29.41 peak age
Why the difference? My guess is that there was something different about our criteria for selecting players for the sample. Again, I don't think the difference affects the arguments to follow.
Now, this is where J.C. says he ran the regression again, for 1000PA and no 10-year-requirement, and got no difference in peak age. I did the same thing, and I *did* get a difference:
Mine, for 5000 PA: 28.62
Mine, for 1000 PA: 28.06
It looks like a small difference, only .56 years -- and the total of 28.06 is still above the previous studies' conclusion that the peak is in the 27s. However, as it turns out, the way the study is structured, that small difference is really a big difference. Let me show you.
First, I ran the same regression, but this time only for players with 3000-5000 PA:
3000-5000 PA: 27.61
So, these guys with shorter careers did have an earlier peak, about a year earlier than the guys with the longer careers. What if we now look at the guys with really short careers, 1000-3000 PA?
1000-3000 PA: 147.00
That's not a misprint: the peak came out to age 147! But the coefficients of the age curve were not close to statistical significance -- neither the age, nor the age-squared. Effectively, these guys performed almost the same regardless of age. They didn't peak at 29, but neither did they peak at 27. They just didn't peak.
And so, it's reasonable to conclude that one of the reasons the peak age dropped so little, when we added more players like Bradbury did, is that the regression wasn't able to find the peak for the players with the shorter careers. And so the sample still consists of mostly players with longer careers.
------
Can we solve this problem? Yes, I think so. The procedure cut off the sample of players at 24 and 35 years of age. If we eliminate the cutoff, the results start to work.
I reran the regression with no age restrictions: players had to have 5000 or 1000 PA anywhere in their careers, not just between 24 and 35. Also, I considered all seasons in which they had 300 PA, regardless of how old they were that year. The numbers are similar:
28.97 for 5000 PA+
28.66 for 1000 PA+
The difference is smaller now, 0.31 years. But the important result is the breakdown of the 1000+ group:
28.97 for 5000 PA+
27.72 for 3000-5000 PA
26.61 for 1000-3000 PA (now significant)
----------------------------------------
28.66 for the overall sample
It seems like the shorter the career, the earlier the peak.
But, still, the overall average seems to only have dropped 0.31 of a year, and it's still around 29 years. Isn't that still evidence against the 27 theory?
No, it's not.
Take a look at the above table again: we have three peaks, 28.97, 27.72, and 26.61. Those three numbers average to 27.77. Why, then, is the "overall" number so much higher, at 28.66?
It's because there were a lot more datapoints in the 5000 PA+ category than the others. And that makes sense. The more PA, the more seasons played. And each season gets a datapoint. So the top category is full of batters with 10 or more seasons, while the bottom category is full of batters with only a few seasons. In fact, some of them may have only 1-2 qualifying seasons of 300 PA or more.
If a player has a 15-year career, with a peak at age 29, he gets fifteen "29" entries in the database. If another player has a 3-year career with a peak of 27, he gets only three "27" entries. So instead of the result working out to 28, which is truly the average peak of the two players, it works out to 28.7.
Another way to look at it: Player A has a 12-year career. Player B has a 2-year career. What's the average career? It's 7 years, right? And you get that by averaging 12 and 2.
But the way Bradbury's study is designed, it would figure the average career is 10.57 years. Instead of averaging 12 and 2, it would average 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 2, and 2. That's not the result we're looking to find.
This is less of a problem in Bradbury's original study, because, by limiting players to 12 years of their career, and requiring them to play 10 seasons, most of the batters in the study would be between 10 and 12 years, so the weightings would be closer. Still, this feature of the study means that it's probably overestimating the peak at least a little bit, even for that sample of players.
So, anyway, if 28.66 is not the right average because of the wrong weights, how can we fix it? Simple: instead of weighting by the number of regression rows in each group, we weight by the number of players in each group:
28.97 for: 640 players with 5000+PA
27.72 for: 595 players with 3000-5000 PA
26.61 for 1148 players with 1000-3000 PA
----------------------------------------
27.52 overall average
So what looked like a small drop when we added the shorter-career players -- 0.31 years -- turns into a big drop -- 1.55 years -- when we weight the data properly.
Now, this only works when there actually IS a drop between the 5000+ and the 1000+ groups. We found a drop of 0.31. But on his blog, Bradbury said that with his data, he found no drop at all.
How come? I'm not sure. But one reason might be random variation (if he used different selection criteria). Another might be his age restriction causing nonsensical results in the important 1000-3000 group. And there are his other variables for "missed information resulting from playing conditions". Or, of course, I may have done something wrong.
------
So we're down to 27.52. That's pretty close to the traditional estimates of 27ish. But I think we're not necessarily done: there are at least two factors I can think of that suggest that the real value is lower than even 27.52.
First, we showed that the regression overestimates the peak age by overweighting long careers relative to short careers. We were able to get the average to drop from 28.66 to 27.52 just by breaking the sample down and reweighting.
By the same logic, all three groups above must also be overestimates! In the middle group, players with 5000 PA are going to be weighted 67% higher than players with only 3000 PA. If we were to rerun the regression after breaking the group down further, into (say) 3000-4000 and 4000-5000, we'd get a lower estimate than 27.52. In fact, we could break those new groups down into smaller groups, and break those groups down into smaller groups, and so on. The problem is that the sample size would get too small to get reasonable results. But I'm betting the average would drop significantly.
Second, the study leaves out players with less than 1000 PA. That's probably a good thing, because with only 1 or 2 seasons, it's hard to fit a trajectory properly. Still, it seems likely that if there were a way of figuring it out, we'd find those players would peak fairly early, bringing the average down further.
------
So, in summary:
-- If we use the Bradbury model on groups of players with fewer PA, we find that those players are estimated to have lower peak age. This supports the hypothesis that choosing only 5000+ PA players biases the result too high.
-- The model used in Bradbury's study consistently overestimates peak age for another reason. That's the weighting problem -- it figures the peak for the average *season*, not for the average *player*.
-- Correcting for that shows that if we look at players with 1000 PA, instead of just players with 5000 PA, the peak age drops to the mid 27s.
-- Other corrections that we can't make, because of sample size issues, would drop the peak age even further.
-- There is good evidence that the shorter the career, the younger the peak age.
-- It doesn't seem possible, with this method, to get a precise estimate of average peak age. "Somewhere in the low 27s" is probably the best it can do, if even that.
posted by Phil Birnbaum @ 11/26/2009 12:53:00 AM 12 comments
Monday, November 23, 2009
The Bradbury aging study, re-explained
A few days ago, J.C. Bradbury responded to my recent post on his age study.
Bradbury had authored a study claiming that hitters peak at age 29.4, contradicting other studies that showed a peak around 27. His study was based on the records of all batters playing regularly between age 24 and 35. I argued that, by choosing only players with long careers progressing to an relatively advanced age, his results were biased towards players who peak late -- because, after all, someone with the same career trajectory, just starting a few years earlier, would be out of baseball by 35 and therefore not make the study.
In response, Bradbury denies that selective sampling is a problem. He writes,
"Phil Birnbaum has a new theory as to why I’m wrong (I suspect it won’t be his last)."
Actually, it's not a new theory. I mentioned it at exactly the same time and in the same post as another theory, last April. Bradbury actually linked to that post a few days ago.
Also, the reason "it won't be my last" is that, like many other sabermetricians, I am curious to find out why there's a difference between Bradbury's findings, which find a peak age of 29+, and many previous studies, which find a peak age of 27. They can't both be correct, and they way to resolve the contradiction is to suggest reasons and investigate whether they might be true.
But, Bradbury also said that I showed "a serious lack of understanding of the technique I employed." He's partially right -- I did misunderstand what he did. After rereading the paper and playing around with the numbers a bit, I think I have a better handle on it now. This post, I'm going to try explaining it (and why I still believe it's biased). Please let me know if I've got anything wrong.
-----
Previously, I had incorrectly assumed that Bradbury's study worked like other aging studies I've seen (such as Justin Wolfers', or Jim Albert's (.pdf)). In those other studies, the authors took a player's performance over time, smoothed it out into a quadratic, and figured out the peak for each player.
Then, after doing that for a whole bunch of players, those other studies would gather all the differently shaped curves, and analyze them to figure out what was going on. They implicitly assumed that every player has his own unique trajectory.
Bradbury's study doesn't do that. Instead, Bradbury uses least-squares to estimate the best single trajectory for *every batter in the study*. That's 450 players, all with exactly the same curve, based on the average.
According to this model, the only difference between the players is that some players are more productive than others. Otherwise, every batter has exactly the same shaped curve. The only difference the model allows, between the curves of different players, is vertical movement, up for a better player, down for a worse one.
For instance: take Carlos Baerga, whose career peaks early with a short tail on the left and a long tail on the right, peak in his early 20s. Then take Barry Bonds, whose career is the opposite: his career peaks late, with a long tail on the left and a short tail on the right.
What Bradbury's model does is take both curves, put them in a blender, and come out with two curves that look exactly the same, peaking in the late 20s. The only difference is that Bonds' is higher, because his level of performance is better.
The model fits 450 identical curves to the actual trajectories of the 450 players. They can't be particularly good fits, because they're all the same. If you look at those 450 fitted curves, they're like a vertical stack of 450 identical boomerangs: some great hitter at the top, some really crappy hitter at the bottom, and the 448 other players in between.
I can pull a boomerang off the top, and show you, this is what Barry Bonds looks like. The best fit is that he started low, climbed until he reached 29 or so, then started a symmetrical decline (the model assumes symmetry). You'll ask, "what does Carlos Baerga look like?" I'll say, "it's exactly the same as Barry Bonds, but lower." I'll take my Barry Bonds boomerang, and lower my arm a couple of inches. Or, I can just pull the Baerga boomerang out of the middle of the stack.
(One more way of putting it. See this chart? This is how Justin Wolfers represents the careers of a bunch of great pitchers. He smoothed the actual trajectories, but modeled that every pitcher gets his own peak age, and his own steepness of curve. But for this study, they would all be the same shape, just one stacked above the other.)
-----
Now, it's seems to me that the model is way oversimplified. It's obviously false that all players have the same trajectory and the same peak age. People are different. They mature at different rates, both in raw physical properties, and in how fast they learn and adapt. Indeed, this is something the study acknowledges:
"Doubles plus triples per at-bat peaks 4.5 years later for Hall-of-Famers, which indicates that elite hitters continue to improve and maintain some speed and dexterity while other players are in decline."
So, implicitly, even Bradbury admits that the model's assumptions are wrong: some players age differently than others.
However, even if the model is wrong in its assumptions and in how it predicts individual players, it's possible to argue that the composite player it spits out is still reasonable.
For instance, suppose you have three people. One is measured to be four feet tall, one five feet, and one six feet. There are two ways you can get the average. You can just average the three numbers, and get five feet.
Or, you can create a model, an unrealistic model, that says that all three are really the same height, and any discrepancies are due to uncorrelated errors by the person with the measuring tape. If you run a regression to minimize the sum of squares of those errors, you get an estimate that all three people are actually ... five feet.
The model is false. The three people aren't really of equal height, and nobody is so useless with a tape measure that their observations would be off by that much. But the regression nonetheless gives the correct number: five feet. And so you'll be OK if you use that number as the average, so long as you don't actually assume that the model matches reality, that the six-foot guy is really the same height as the four-foot guy. Because there's no evidence that they are -- it was just a model that you chose.
I think that's what's happening here. It's obvious that the model doesn't match reality, but it has the side effect of creating a composite average baseball player, whose properties can be observed. As long as you stick to those average properties, and don't try to assume anything about individual players, you should be OK. And that's what Bradbury does, for the most part, with one exception.
----
A consequence of the curves having the same shape is that declines are denominated in absolute numbers, rather than percentages of a player's level. If the model says you lose 5 home runs between age X and age Y, then it assumes *everyone* loses 5 home runs, everyone from Barry Bonds to Juan Pierre -- even if Juan Pierre didn't have 5 home runs a year to lose!
If Bonds is a 30 home run guy at age X, he's predicted to drop to 25 -- that's a 17% decline. If Juan Pierre is a 5 home run guy at age X, he's predicted to drop to 0 -- a 100% decline.
In real life, that's probably not the way it works -- players probably drop closer to the same percentage than by the same amount. Table VII of the paper says that a typical hitter would lose about half his homers (on a per PA basis) between 30 and 40. If Bradbury used a season rate of 16 homers as "typical," that's a 8 HR decline. But what about players who hit only 4 homers a year, on average? The model predicts them dropping to minus 4 home runs!
Now, that's a bit of an unfair criticism. The text of the study doesn't explicitly argue that a Bonds will drop by the same number of home runs as a Baerga, even though the study deliberately chose a model that says exactly that. Remember, the model is unrealistic, so as long as you stick to the average, you're OK. Bonds and Pierre are definitely not the average.
But, then, why does Bradbury's Table VII deal in percentages? The model deals in absolutes. Bradbury obtained the percentages by applying the absolutes to a "typical" player, presumably one close to average. So why not put "-8 HR" in that cell, rather than "-48.95%"?
By showing percentages, there's an unstated implication, that since the model shows an average player with 16 HR drops to 8, you can extrapolate to say that a player with 40 HR will drop to 20. But that would have to be backed up by evidence or argument. And the paper provides neither.
-----
To summarize:
-- the model assumes all players have the same peak age, and the same declines from their peak (which is another way of saying that it assumes that all players have the same shape of trajectory.)
-- it does assume some players (Barry Bonds) have a higher absolute peak than others (Jose Oquendo), but still have the same shape of career.
-- it assumes that all players rise and decline annually by the same absolute amount. In the agespan it takes for a 10-triple player to decline to 5 triples, a 6-triple player will decline to 1 triple, and Willie Aikens will decline to -5 triples.
What can you get out of a model like that, with its unrealistic assumptions? I think that you can reasonably look at the peak and shape as applied to some kind of hypothetical composite of the players used in the study. But I don't think you can go farther than that, and make any assumptions about other types of players.
So: when Bradbury's study comes up with the result that his sample of players peaked at 29.5 years (for Linear Weights), I think that's probably about right -- for his sample of players. When he says that the average home run hitter loses 8 home runs between 30 and 40, I think that's probably about right too -- for his sample of players.
My main argument is not that the model is unrealistic, and it's not that there's something wrong with the regression used to analyze the model. It's that the sample of players that went into the model is biased, and that's what's causing the peak to be too high.
Bradbury's model works for his sample -- but not for all baseball players, just the ones he chose. Those were the ones who, in retrospect, had long careers.
To have a long career, you have to keep up your performance for many years. To keep up your performance for many years, you need to have a slower decline than average. If you have a slower decline than average, a higher proportion of your value comes later in your career. If a higher proportion of your value comes later in your career, that means that you'll have an older-than-average peak.
So choosing players with long careers results in a peak age higher than if you looked at all players.
Bradbury disagrees. He thinks that Hall of Fame players may have a significantly different peak than non Hall-of-Fame players, but doesn't think that players with long careers might have a different peak than players with short careers.
That really doesn't make sense to me. But Bradbury has evidence. In his response to my post, he reran his study, but for all players with a minimum of 1000 PA, instead of his previous minimum 5000 PA. That is, he added players with short careers.
He found no difference in the peak age.
That's a pretty persuasive argument. I argued A, Bradbury argued B, and the evidence appears to be consistent with B. No matter how good my argument sounds, if the evidence doesn't support it, I better either stop arguing A, or explain why the evidence isn't consistent with B.
Still, the logic didn't seem right to me. So I spent a couple of days trying to replicate Bradbury's study. I wasn't able to duplicate his results perfectly, but many of them are close. And I'm not sure, but I think I have an idea about what's going on, why the evidence is consistent with A. That is, why Bradbury's 1000+ study comes up with a peak of 29 years, while other studies have come up with 27.
I'll get to that in the next post.
posted by Phil Birnbaum @ 11/23/2009 11:34:00 PM 2 comments
Monday, November 16, 2009
Selective sampling and peak age
Back a couple of years ago, I reviewed a paper by J.C. Bradbury on aging in baseball. J.C. found that players peak offensively around age 29, rather than the age 27 found in other studies.
I had critiqued the study on three points:
-- assuming symmetry;
-- selective sampling of long careers;
-- selective sampling of seasons.
In a blog post today, J.C. responds to my "assuming symmetry" critique. I had argued that if the aging curve in baseball has a long right tail, the median of the symmetrical best-fit curve would be at a higher age than the peak of the original curve. That would cause the estimate to be too high. But, today, J.C. says that he tried non-symmetrical curves, and he got roughly the same result.
So, I wondered, if the cause of the discrepancy isn't the poor fit of the quadratic, could selective sampling be a big enough factor? I ran a little experiment, and I think the answer is yes.
J.C. considered only players with long careers, spanning ages 24 to 35. It seems obvious that that would skew the observed peak higher than the actual peak. To see why, take an unrealistic extreme case. Suppose that half of players peak at exactly 16, and half peak at exactly 30. The average peak is 24. But what happens if you look only at players in the league continuously from age 24 to 35? Almost all those players are from the half who peak at 30, and almost none of those guys are the ones who peaked at 16. And so you observe a peak of 30, whereas the real average peak is 24.
As I said, that's an unrealistic case. But even in the real world, you expect early peakers to be less likely to survive until 35, and your sample is still skewed towards late peakers. So the estimate is still biased. Is the bias significant?
To test that, I did a little simulation experiment. I created a world where the average peak age is 27. I made two assumptions:
-- every player has his own personal peak age, which is normally distributed with mean 27 and variance 7.5 (for an SD of about 2.74).
-- I assumed that for every year after his peak, a player has an additional 1/15 chance (6.6 percentage points) to drop out of the league. So if a player peaks at 27, his chance of still being in the league at age thirty-five is (1 minus 8/15), since he's 8 years past his peak. That's 46.7%. If he peaks at 30, 35 is only five years past his peak, so his chance would be 66.7% (which is 1 minus 5/15).
Then, I simulated 5,000 players. Results:
27.0 -- The average peak age for all players.
28.1 -- The average observed peak age of those players who survived until age 35.
The difference between the two results is the result of selective sampling. So, with this model and these assumptions, J.C.'s algorithm overestimates the peak by 1.1 years.
We can get results even more extreme if we change some of the assumptions. Instead of longevity decaying by 1/15, suppose it decays by 1/13? Then the average observed age is 28.5. If it decays by 1/12, we get 28.9. And if it decays by 1/10, the peak age jumps to 30.9.
Of course, we can get less extreme results too: if we use a decay increment of only 1/20, we get an average of 27.6. And maybe the decay slows down as you get older, and we might have too steep a curve near the end. Still, no matter how small the increment, the estimate will still be too high. The only question is, how much too high?
I don't know. But given the results of this (admittedly oversimplified) simulation, it does seem like the bias could be as high as two years, which is the difference between J.C.'s study and others.
If we want to get an unbiased estimate for the peak for all players, not just the longest-lasting ones, I think we'll have to use a different method than tracking career curves.
UPDATE: Tango says it better than I did, here.
posted by Phil Birnbaum @ 11/16/2009 02:47:00 PM 0 comments
Friday, April 03, 2009
J.C. Bradbury on aging in baseball
J.C. Bradbury is on vacation from blogging, but is still posting occasionally. This week, he wrote that his article on baseball aging patterns has been published. Here's the link to the published version (gated), and here's a link to a freely-available version from last August.
Here's what JC did. He took every player with at least 5000 PA (4000 batters faced for pitchers) who debuted in 1921 or later. Then, for those players, he considered every season in which they had at least 300 PA (or 200 batters faced). That left a total of 4,627 player-seasons for hitters, and 4,145 for pitchers.
-- the player's career average
-- the player's age that year, and age-squared (that is, quadratic on age)
-- a dummy variable for the league-season
-- a "player-specific error term".
Numbers are park adjusted.
After running the regression, Bradbury calculates the implied "peak age" for each metric:
29.41 linear weights
29.13 OPS
30.04 OBP
28.58 SLG
28.35 AVG
32.30 BB
28.26 DPT (doubles plus triples rate)
29.89 HR
29.16 ERA
29.05 RA
23.56 Strikeouts (for pitchers)
32.47 Walks (allowed)
27.39 Home Runs (allowed)
For most of the hitting categories, the peak age is above the conventional wisdom of 27 – most are around 29. After quoting various studies that have found younger peaks, Bradbury writes,
"The results indicate that both hitters and pitchers peak around 29. This is older than some estimates of peak performance ..."
Bradbury also notes that the results are consistent with the idea that the more raw athleticism is required, the earlier the skill peaks; strikeouts, for instance, which require raw arm speed peak the earliest, and walks, which are largely mental, peak the latest:
"Consistent with studies of ageing in specific athletic skills, baseball players peak earlier (later) in abilities that require more (less) physical stress."
I agree with Bradbury on this last point, but I don't think his actual age estimates can be relied upon. Specifically, I think peak ages are really closer to 27 than to 29.
One reason for this is that the model specifically requires the curve to be a quadratic – that is, symmetrical before and after the peak. But are careers really symmetrical? Suppose they are not – suppose the average player rises sharply when he's young, then falls gradually until old age. The curve, then, would be skewed, with a longer tail to the right.
Now, suppose you try to fit a symmetrical curve to a skewed curve, as closely as you can. If you pull out a sheet of paper and try it, you'll see that the peak of the symmetrical curve will wind up to the right of the actual curve. The approximation peaks later than the actual, which is exactly what JC found.
I have no proof that the actual aging curve is asymmetrical in this exact way, but players career's are not as regular as the orbits of asteroids. There's no particular reason that you'd expect players to fall at exactly the same rate as they rise, especially when you factor in playing time and injuries. The quadratic is a reasonable approximation, but that's all it is.
Another reason is selective sampling. By choosing only players with long careers, Bradbury left out any player who flames out early. And so, his sample is overpopulated with players who aged particularly gracefully. That would tend to overestimate the age at which players peak.
(He limited his data to players between 24 and 35, which he says is done to minimize selection bias, but I'm not sure how that would help.)
There is perhaps some evidence that there's a real effect. JC ran the same regression again, but this time including only players with Hall of Fame careers. For hitters, the peak age dropped by almost an entire year, from 29.41 to 28.51. That might makes sense; HOFers are the best players ever, and were more likely to have had long careers even if they aged less gracefully. That is, they'd still be good enough to stay in the league after a substantial drop, and would be much more likely to hit the 5000 PA cutoff even if they peaked early and dropped sharply.
(In fairness, you could argue that HOFers were less likely to be injured, and therefore more likely to peak later. But I think the "good enough to stay in the league" effect is larger than that, although I have no proof. Also, the HOF pitchers' peak age dropped only 0.08 years from the non-HOFers, so the effect I cite seems to hold only for hitters.)
Finally, there's selective sampling on individual seasons. A player who falls sharply and suddenly won't get enough playing time to qualify for Bradbury's study that year. And so, a plot of his career will be gentler at the right side. He'd be nearly vertical between his next-to-last season and his last season. But, since Bradbury doesn't consider his last season, the study won't see that vertical drop, and the quadratic will be gentler, with its peak to the right of where it would be otherwise.
Try this yourself: draw an aging curve that peaks, drops a bit, then falls off vertically. Draw the best fit symmetrical curve on it.
Now, draw the same again curve, but, instead of the vertical line, have it just end before the vertical line starts. Draw the best-fit symmetrical curve on this second one. You'll see it peaks later than when the vertical line was there.
(Again, in fairness: Bradbury ran a version of his study in which there was no season minimum for plate appearances or batters faced – just the career minimums -- and the results were similar. I've explained why I think, in theory, the minimums should skew the results, but I have to admit that, in real life, they didn't. There are perhaps some other reasons it didn't happen – perhaps a lot of the effect comes from the "vertical" players released in spring training, so they didn't make the study at all – but still, the results do seem to contradict this third theory of mine.)
So you've got three ways in which the study may have made assumptions or simplifications that forced the peak age to be higher than it should be:
-- assuming symmetry;
-- selective sampling of long careers;
-- selective sampling of seasons.
In that light, my conclusions would be that Bradbury's methodology might yield a reasonable approximation, but not much more than that. I think the study can correctly identify the basic trend, and is probably correct within a couple of years, but I wouldn’t bet on it being any closer than that.
posted by Phil Birnbaum @ 4/03/2009 10:42:00 AM 14 comments