Sabermetric Research
Phil Birnbaum
Tuesday, July 22, 2014
Did McDonald's get shafted by the Consumer Reports survey?
McDonald's was the biggest loser in Consumer Reports' latest fast food survey, ranking them dead last out of 21 burger chains. CR readers rated McDonald's only 5.8 out of 10 for their burgers, and 71 out of 100 for overall satisfaction. (Ungated results here.)
CR wrote,
"McDonald's own customers ranked its burgers significantly worse than those of [its] competitors."
Yes, that's true. But I think the ratings are a biased measure of what people actually think. I suspect that McDonald's is actually much better loved than the survey says. In fact, the results could even be backwards. It's theoretically possible, and fully consistent with the results, that people actually like McDonald's *best*.
I don't mean because of statistical error -- I mean because of selective sampling.
-----
According to CR's report, 32,405 subscribers reported on 96,208 dining experiences. That's 2.97 restaurants per respondent, which leads me to suspect that they asked readers to report on the three chains they visit most frequently. (I haven't actually seen the questionnaire -- they used to send me one in the mail to fill out, but not any more.)
Limiting respondents to their three most frequented restaurants would, obviously, tend to skew the results upward. If you don't like a certain chain, you probably wouldn't have gone lately, so your rating of "meh, 3 out of 10" wouldn't be included. It's going to be mostly people who like the food who answer the questions.
But McDonald's might be an exception. Because even if you don't like their food that much, you probably still wind up going occasionally:
-- You might be travelling, and McDonald's is all that's open (I once had to eat Mickey D's three nights in a row, because everything else nearby closed at 10 pm).
-- You might be short of time, and there's a McDonald's right in Wal-Mart, so you grab a burger on your way out and eat it in the car.
-- You might be with your kids, and kids tend to love McDonald's.
-- There might be only McDonald's around when you get hungry.
Those "I'm going for reasons other than the food" respondents would depress McDonald's ratings, relative to other chains.
Suppose there are two types of people in America. Half of them rate McDonald's a 9, and Fuddruckers a 5. The other half rate Fuddruckers an 8, but McDonald's a 6.
So, consumers think McDonald's is a 7.5, and Fuddrucker's is a 6.5.
But the people who prefer McDonald's seldom set foot anywhere else -- where there's a Fuddrucker's, the Golden Arches are always not too far away. On the other hand, fans of Fuddrucker's can't find one when they travel. So, they wind up eating at McDonald's a few times a year.
So what happens when you do the survey? McDonald's gets a rating of 7.5 -- the average of 9s from the loyal customers, and 6s from the reluctant ones. Fuddruckers, on the other hand, gets an average of 8 -- since only their fans vote.
That's how, even if people actually like McDonald's more than Fuddrucker's, selective sampling might make McDonald's look worse.
------
It seems likely this is actually happening. If you look at the burger chain rankings, it sure does seem like the biggest chains are clustered near the bottom. Of the five chains with the most locations (by my Googling and estimates), all of them rank within the bottom eight of the rankings: Wendy's (burger score 6.8), Sonic (6.7), Burger King (6.6), Jack In The Box (6.6), and McDonald's (5.8).
As far as I can tell, Hardees is next biggest, with about 2,000 US restaurants. It ranks in the middle of the pack, at 7.5.
Of the ten chains ranked higher than Hardee's, every one of them has less than 1,000 locations. The top two, Habit Burger Grill (8.1) and In-N-Out (8.0), have only 400 restaurants between them. Burgerville, which ranked 7.7, has only 39 stores. (Five Guys (7.9) now has more than 1,000, but the survey covered April, 2012, to June, 2013, when there were fewer.)
The pattern was the same in other categories, where the largest chains were also at or near the bottom. KFC ranked worst for chicken; Subway rated second-worst for sandwiches; and Taco Bell scored worst for Mexican.
And, the clincher, for me at least: the chain with the worst "dining experience," according to the survey was Sbarro, at 65/100.
What is Sbarro, if not the "I'm stuck at the mall" place to get pizza? Actually, I think there's even a Sbarro at the Ottawa airport -- one of only two fast food places in the departure area. If you get hungry waiting for your flight, it's either them or Tim Hortons.
The Sbarro ratings are probably dominated by customers who didn't have much of a choice.
(Not that I'm saying Sbarro is actually awesome food -- I don't ever expect to hear someone say, unironically, "hey, I feel like Sbarro tonight." I'm just saying they're probably not as bad as their rating suggests.)
------
Another factor: CR asked readers to rate the burgers, specifically. In-N-Out sells only burgers. But McDonald's has many other popular products. You can be a happy McDonald's customer who doesn't like the burgers, but you can't be a happy In-N-Out customer who doesn't like the burgers. Again, that's selective sampling that would skew the results in favor of the burger-only joints.
And don't forget: a lot of people *love* McDonald's french fries. So, their customers might be prefer "C+ burger with A+ fries" to a competitor who's a B- in both categories.
That thinking actually *supports* CR's conclusion that people like McDonald's burgers less ... but, at the same time, it makes the arbitrary ranking-by-burger-only seem a little unfair. It's as if CR rated baseball players by batting average, and ignored power and walks.
For evidence, you can compare CRs two sets of rankings.
In burgers, the bottom eight are clustered from 6.6 to 6.8 -- except McDonald's, a huge outlier at 5.8, as far from second-worst as second-worst is from average.
In overall experience, though, McDonald's makes up the difference completely, perhaps by hitting McNuggets over the fences. It's still last, but now tied with Burger King at 71. And the rest aren't that far away. The next six range from 74 to 76 -- and, for what it's worth, CR says a difference of five points is "not meaningful".
-----
A little while ago, I read an interesting story about people's preferences for pies. I don't remember where I read it so I may not have the details perfect. (If you recognize it, let me know.)
For years, Apple Pie was the biggest selling pie in supermarkets. But that was when only full-size pies were sold, big enough to feed a family. Eventually, one company decided to market individual-size pies. To their surprise, Apple was no longer the most popular -- instead, Blueberry was. In fact, Apple dropped all the way to *fifth*.
What was going on? It turns out that Apple wasn't anyone's most liked pie, but neither was it anyone's least liked pie. In other words, it ranked high as a compromise choice, when you had to make five people happy at once.
I suspect that's what happens with McDonald's. A bus full of tourists isn't going to stop at a specialty place which may be a little weird, or have limited variety. They're going to stop at McDonald's, where everyone knows the food and can find something they like.
McDonald's is kind of the default fast food, everybody's second or third choice.
------
But having said all that ... it *does* look to me that the ratings are roughly in line with what I consider "quality" in a burger. So I suspect there is some real signal in the results, despite the selective sampling issue.
Except for McDonald's.
Because, first, I don't think there's any way their burgers are *that* much "worse" than, say, Burger King's.
And, second, every argument I've made here applies significantly more to McDonald's than to any of the other chains. They have almost twice as many locations as Burger King, almost three times as many as Wendy's, and almost four times as many as Sonic. Unless you truly can't stand them, you'll probably find yourself at McDonald's at some point, even if you'd much rather be dining somewhere else.
All the big chains probably wind up shortchanged in CR's survey. But McDonald's, I suspect, gets spectacularly screwed.
Labels: Consumer Reports, fast food, McDonald's, selective sampling
posted by Phil Birnbaum @ 7/22/2014 02:43:00 PM 6 comments
Sunday, May 18, 2014
Another "hot hand" false alarm
Here's a Deadspin article called "Gambling Hot Streaks are Actually Real." It's about a study by academic researchers in London who examined the win/loss patterns of online sports bettors. The more wagers in a row a client won, the more likely he was to also win his next bet. That is: gamblers appear to exhibit the proverbial "hot hand."
It was a huge effect: bettors won only 48% of the time overall, but over 75% of the time after winning five in a row. Here's Deadspin's adaptation of the chart:
Keeping in mind the principle of "if it seems too good to be true, it probably is," you can probably think for a minute and come up with an idea of what might really be going on.
-----
The most important thing: the bets that won 75% didn't actually win more money than expected -- they were just at proportionately low odds. That is: the "streaking" bettors were more likely to back the favorites on their next bet. (The reverse was also true: bettors on a losing streak were more likely to subsequently bet on a longshot.)
As the authors note, bettors are not actually beating the bookies in their subsequent wagers -- it's just that they're choosing bets that are easier to win.
What the authors find interesting, as psychologists, is the pattern. They conclude that after winning a few wagers in a row, the bettors become more conservative, and after losing a few in a row, they become more aggressive. They suggest that the bettors must believe in the "Gambler's Fallacy," that after a bunch of losses, they're due for a win, and after a bunch of wins, they're due for a loss. That is: they take fewer chances when they think the Fallacy is working against them.
But, why assume that the bettors are changing their behavior? Shouldn't the obvious assumption be that it's selective sampling, that bettors on a winning streak had *always* been backing favorites?
Some bettors like long shots, and lose many in a row. Some bettors like favorites, and win many in a row. It's not that people bet on favorites because they're on winning streaks -- it's that they're on winning streaks because they bet on favorites!
Imagine that there are only two types of bettors, aggressive and conservative. Aggressives bet longshots and win 20% of the time; conservatives bet on favorites and win 80% of the time.
Aggressives will win five in a row one time in 3,125. Conservatives will win five in a row around one time in 3. So, if you look at all bettors on a five-win hot streak, there are 1024 conservatives for every aggressive. (In fact, for every streak length, the factor increases by 4. 4:1 after one win, 16:1 after two wins, and so on, to 1024:1 after five wins.)
It seems pretty obvious that's what must be happening.
----
But wait, it's even more obvious when you look closer at the study. It turns out the authors combined three different sports into a single database: horse racing, greyhound racing, and soccer.
A soccer game result has three possibilities -- win, lose, draw -- so the odds (before vigorish) have to average 2:1. On the other hand, if there are 11 horses in a race, the odds have to average 10:1.
Well, there you go! The results probably aren't even a difference between "aggressives" and "conservatives". It's probably that some bettors wager only on soccer, some wager only on racing, and it's the soccer bettors who are much more likely to win five in a row!
----
There's strong evidence of that kind of "bimodality" in the data. The authors reported that, overall, bettors won 48% of their wagers -- but at average odds of 7:1. That doesn't make sense, right? 48% should be more like even money.
I suspect the authors just used a simple average of the odds numbers. They took a trifecta, with 500:1 odds, and a soccer match, with 1:1 odds, and came up with an simple average of 250:1.
It doesn't work that way. You have to use the average of the probabilities of winning -- which, in this case, are 1/2 and 1/501. The average of those is 503/2004, which translates to average odds of 1501:503, or about 3:1. (Another way to put it: add 1 to all the odds, take the harmonic mean, and subtract 1 from the result. If you leave out the "add 1" and "subtract 1", you'll probably be close enough in most cases.)
The bigger the spread in the odds, the worse the simple average works. So, the fact that 48% is so far from 7:1 is an indication that they're mixing heavy favorites with extreme longshots. Well, actually, we don't need that indication -- they authors report that the SD of the odds was 38.
----
Finally, if none of that made sense, here's an analogy for what's going on.
I study people who have a five year streak of eating lox and bagels. I discover that, in Year Six, they're much more likely to celebrate Hanukkah than people who don't have such a streak. Should I conclude that eating lox and bagels makes people convert to Judaism?
Labels: hot hand, psychology, selective sampling
posted by Phil Birnbaum @ 5/18/2014 03:10:00 PM 6 comments
Sunday, September 22, 2013
Selective sampling could explain point-shaving "evidence"
Remember, a few years ago, when a couple of studies came out that claimed to have found evidence for point shaving in NCAA basketball? There was one by Jonathan Gibbs (which I reviewed here), and another by Justin Wolfers (.pdf). I also reviewed a study, from Dan Bernhardt and Steven Heston, that disagreed.
Here's a picture stolen from Wolfers' study that illustrates his evidence.
It's the distribution of winning margins, relative to the point spread. The top is teams that were favored by 12 points or less, and the bottom is teams that were favored by 12.5 points or more. The top one is roughly as expected, but the bottom one is shifted to the left of zero. That means heavy favorites do worse than expected, based on the betting line. And, heavy favorites have the most incentive to shave points, because they can do so while still winning the game.
After quantifying the leftward shift, Wolfers argues,
"These data suggest that point shaving may be quite widespread, with an indicative, albeit rough, estimate suggesting that around 6 percent of strong favorites have been willing to manipulate their performance."
But ... I think it's all just an artifact of selective sampling.
Bookmakers aren't always trying to be perfectly accurate in their handicapping. They may have to shade the line to get the betting equal on both sides, in order to minimize their risk.
It seems plausible to me that the shading is more likely to be in the direction consistent with the results -- making the favorites less attractive. Heavy favorites are better teams, and better teams have more fans and followers who would, presumably, be wanting to bet that side.
I don't know whether that's actually true or not, but it's not actually necessary. Even if the shading is just as likely to happen towards the underdog side as the favorite side, we'd still get a selective-sampling effect.
Suppose the bookies always shade the line by half a point, in a random direction. And, suppose we do what Wolfers did, and look at games where a team is favored by 12 points or more.
What happens? Well, that sample includes every team with a "true talent" of 12 points or more -- with one exception. It doesn't include 12-point teams where the bookies shaded down (for whom they set the line at 11.5).
However, the sample DOES include the set of teams the bookies shaded *up* -- 11.5-point teams the bookies rated at 12.
Therefore, in the entire sample of favorites, you're looking at more "shaded up" lines than "shaded down" lines. That means the favorites, overall, are a little bit worse than the line suggests. And that's why they cover less than half the time.
You don't need to have point shaving for this to happen. You just need for bookies to be sufficiently inaccurate. That's true even if the inaccuracy is on purpose, and -- most importantly -- even if the inaccuracy is as likely to go one way as the other.
------
To get a feel for the size of the anomaly, I ran a simulation. I created random games with true-talent spreads of 8 points to 20 points. I ran 200,000 of the 8-point games, scaling down linearly to 20,000 of the 20-point games.
For each game, I shaded the line with a random error, mean zero and standard deviation of 2 points. I rounded the resulting line to the nearest half point. Then, I threw out all games where the adjusted line was less than 12 points.
(Oops! I realized afterwards that Wolfers used 12.5 points as the cutoff, where I used 12 ... but I didn't bother redoing my study.)
I simulated the remaining games as 100 possessions each team, two-point field goals only.
The results were consistent with what Wolfers found. Excluding pushes, my favorites went 355909-325578, which is a .478 winning percentage. Wolfers' real-life sample was .483.
--------
So, there you go. It's not proof that selective sampling is the explanation, but it sounds a lot more plausible than widespread point shaving. Especially in light of the other evidence:
-- If you look again at the graph of results, it looks like the entire curve moves left. That's not what you'd expect if there were point shaving -- in that case, you'd expect to see an extraordinarily large number of "near misses".
-- as the Bernhardt/Heston study showed, the effect was the same for games that weren't heavily bet; that is, cases where you'd expect point-shaving to be much less likely.
-- And, here's something interesting. In their rebuttal, Bernhardt and Heston estimated point spreads for games that had no betting line, and found a similar left shift. Wolfers criticized that, and I agreed, since you can't really know what the betting line would be.
However: that part of the Bernhardt/Heston study perfectly illustrates this selective sampling point! That's because, whatever method they used to estimate the betting line, it's probably not perfect, and probably has random errors! So, even though that experiment isn't a legitimate comparison to the original Wolfers study, it IS a legitimate illustration of the selective sampling effect.
---------
So, after I did all this work, I found that what I did isn't actually original. Someone else had come up with this explanation first, some five years ago.
In 2009, Neal Johnson published a paper in the Journal of Sports Economics called "NCAA Point Shaving as an Artifact of the Regression Effect and the Lack of Tie Games."
Johnson identified the selective sampling issue, which he refers to as the "regression effect." (They're different ways to look at the same situation.) Using actual NCAA data, he comes up with the result that, in order to get the same effect that Wolfers found, the bookmakers' errors would have to have had a standard deviation of 1.35 points.
I'd quibble with that study on a couple of small points. First, Johnson assumed that the absolute value of the spread was normally distributed around the observed mean of 7.92 points. That's not the case -- you'd expect it to be the right side of a normal distribution, since you're taking absolute values. The assumption of normality, I think, means that the 1.35 points is an overestimate the amount of inaccuracy needed to produce the effect.
Second, Johnson assumes the discrepancies are actual errors on the part of the bookmakers, rather than deliberate line shadings. He may be right, but, I'm not so sure. It looks like there's an easy winning strategy for NCAA basketball -- just bet on mismatched underdogs, and you'll win 51 to 53 percent of the time. That seems like something the bookies would have noticed, and corrected, if they wanted to, just by regressing the betting lines to the mean.
Those are minor points, though. I wish I had seen Johnson's paper before I did all this, because it would have saved me a lot of trouble ... and, because, I think he nailed it.
Labels: basketball, cheating, Neal Johnson, point shaving, selective sampling, statistics, thinking you discovered something new but then realizing someone else figured it out years ago
posted by Phil Birnbaum @ 9/22/2013 02:22:00 AM 13 comments
Monday, April 22, 2013
Do athletes have shorter lifespans?
According to this article in Pacific Standard magazine , athletes have lower lifespans than those in other occupations.
The article cites a recent academic study (.pdf) that looked at 1,000 consecutive obituaries in the New York Times. That study
" ...found the youngest average age of death was among athletes (77.4 years), performers (77.1 years), and non-performers who worked in creative fields, such as authors, composers, and artists (78.5 years). The oldest average age of death was found among people famous for their work in politics (82.1 years), business (83.3 years), and the military (84.7 years)."
The authors of the study say,
"... our data raise the intriguing speculation that young people contemplating certain careers (e.g. performing arts and professional sports) may be faced, consciously or otherwise, with a faustian choice: namely, 1. to maximize their career potential and competitiveness even though the required psychological and physical costs may be expected to shorten their longevity, or 2. to fall short of their career potential so as to balance their lives and permit a normal lifespan."
But: isn't there a selective sampling problem here?
To appear in a New York Times obituary, you have to be relatively famous, or, at least, have passed a certain standard of fame or accomplishment in your chosen field.
If your chosen field is athletics, you reach that threshold early in your life -- in your 30s, say. Wayne Gretzky, Ken Griffey Jr., Bjorn Borg. If your field is business, you probably have to reach the level of CEO to make the Times. The median age of a CEO in the S&P 500 is mid-50s ... so the median age for an *accomplished* CEO is probably around 60.
Same for politics: the median age of a US senator is almost 62 years. For a US congressman, the mean is around 57.
So, of course looking at obituaries will make you think there's a difference! Your sample includes athletes who died at 40, but not politicians who died at 40. Politicians who died at 40 either haven't become famous yet -- or, more likely, haven't even become politicians yet!
And, quickly checking out the US mortality table ... a 35-year-old male is expected to live to about 77.5. A 60-year-old male is expected to live to about 80.9.
Seems about right.
----
If you want a two-line analogy, try this:
No US president has ever died before the age of 35. That doesn't mean that if you want to make sure you don't die in childhood, you should become a US president.
Labels: mortality, selective sampling, statistics
posted by Phil Birnbaum @ 4/22/2013 04:56:00 PM 1 comments