Friday, February 15, 2013
Sloppy journalism with interactive graphics is still sloppy journalism
- The author correctly points out that Woodrow Wilson essentially changed the format of the address through precedent from written document to speech. Right after Wilson's first speech there is a huge drop in the "education level" (hang on for a discussion of this terminology) of these addresses. As I recall, Wilson is the only American president with a Ph.D.
- The index used - Flesch-Kincaid (FK), is questionable. Good on The Guardian to use a single measure for all speeches, but I have to wonder if it is wise to use the same measure for speeches and written addresses. Furthermore, FK is very sensitive to the placement of punctuation (it weights sentence length heavily). For instance, as a friend pointed out, one of Wilson's speeches has a FK grade level of over 17, but if you replace one of the semi-colons in the speech with a period, the FK grade drops to 12. This subtlety is lost in speech format, giving FK an extremely high uncertainty (this same friend calls FK "utterly useless" for speeches).
- The audience of the SOTU address has changed. Though it's a constitutional duty of the president, the delivery as a speech is not, and it only has to be delivered to Congress. However, most modern addresses have been in the form of televised speeches, and have to be understood by a wider and less politically savvy audience.
- Cultural trends in the use of spoken and written English in America involve shorter sentences over time in general.
- In this case, a more sophisticated natural language processing analysis might reveal some interesting trends. For instance, how do wartime speeches compare to times of peace? Are there any natural categories of speeches that fall out? What are the outliers? How does this compare to polls?
Monday, August 20, 2012
Clinical trials: enrollment targets vs. valid hypothesis testing
The questions raised in this Scientific American article ought to concern all of us, and I want to take some of these questions further. But let me first explain the problem.
Clinical trials and observational studies of drugs, biologics, and medical devices are a huge logistical challenge, not the least of which is finding physicians and patients to participate. The thesis of the article is that the classical methods of finding participants – mostly compensation – lead to perverse incentives to lie about one’s medical condition.
I think there is a more subtle issue, and it struck me when one of our clinical people expressed a desire not to put enrollment caps on large hospitals for the sake of a fast enrollment. In our race to finish the trial and collect data, we are biasing our studies toward larger centers where there may be better care. This effect is exactly the opposite of that posited in the article, where treatment effect is biased downward. Here, treatment effect is biased upward, with doctors more familiar with best delivery practices (many of the drugs I study are IV or hospital-based), best treatment practices, and more efficient care.
We statisticians can start to characterize the problem by looking at treatment effect by different sites, or using hierarchical models to separate out center effect from drug. But this isn’t always a great solution, because low-enrolling sites, by definition, have a lot fewer people, and pooling is problematic because low-enrolling centers tend to have way more variation in level and quality of care than high-enrolling centers.
We can get creative on the statistical analysis end of studies, but I think the best solution is going to involve stepping back at the clinical trial logistics planning stage and recasting the recruitment problem in terms of a generalizability/speed tradeoff.
Monday, March 5, 2012
Why I hate p-values (statistical leadership, Part II)
One statistical tool is the ubiquitous p-value. If it’s less than 0.05, your hypothesis must be true, right? Think again.
Ok, so I don’t hate p-values, but I do hate the way that we abuse them. And here’s where we need statistical leadership to go back and critique these p-values before we get too excited.
P-values can make or break venture capital deals, product approval for drugs, or senior management approval for a new design of deck lid. In that way, we place a little too much trust in them. Here’s where we abuse them:
- The magical 0.05: if we get a 0.51, we lose, and if we get a 0.49, we win! Never mind that the same experiment run under the same conditions can easily produce both of these results. (The difference between statistically significant and not significant is not significant.)
- The misinterpretation: the p-value is not the probability of the null hypothesis being true, but rather the long-run relative frequency of times that data from the similar experiments run under the same conditions will produce a test statistic that is at least the value that you had in your experiment, if the null hypothesis is true. Got that? Well, no matter how small your p-value is, I can get a wimpy version of your treatment and get a smaller p-value, just by increasing the sample size to what I need. P-values depend on effect size, effect variance, and sample size.
- The gaming of the p-value: in clinical trials it’s possible to make your p-value smaller by restricting your subject entry criteria to what brings out the treatment effect the most. This is not usually a problem, except to keep in mind that the rarified world of early phase clinical studies is different from the real world.
- The unethical gaming of the p-value: this comes from retrospectively tweaking your subject population. I guess it’s ok if you don’t try to pass this off as real results, but rather as information for further study design, but you can’t expect any scientific validity to tweaking a study, its population, or its analysis after the results are in.
- Covariate madness: covariates tend to decrease the p-value by partitioning the variation in drug effect. That’s great if you want to identify segments of your patient population. But if you do covariate selection and then report your p-value from the final model, you have a biased p-value.
Statisticians need to stay on top of these issues and advocate for the proper interpretation of p-values. Don’t leave it up to someone with an incomplete understanding of these tools.