Showing posts with label visualization. Show all posts
Showing posts with label visualization. Show all posts
Friday, February 15, 2013
Sloppy journalism with interactive graphics is still sloppy journalism
The Guardian recently discussed the "declining linguistic standards" in State of the Union addresses. I thought this was an interesting exercise, but something seemed wrong about the article, and it turns out this is one case where the data do not really speak for themselves. There's a lot of interpretation and understanding behind cultural trends in the use of the English language in America, as well as the evolution of the presidents' intentions behind the address. There are a few important points:
- The author correctly points out that Woodrow Wilson essentially changed the format of the address through precedent from written document to speech. Right after Wilson's first speech there is a huge drop in the "education level" (hang on for a discussion of this terminology) of these addresses. As I recall, Wilson is the only American president with a Ph.D.
- The index used - Flesch-Kincaid (FK), is questionable. Good on The Guardian to use a single measure for all speeches, but I have to wonder if it is wise to use the same measure for speeches and written addresses. Furthermore, FK is very sensitive to the placement of punctuation (it weights sentence length heavily). For instance, as a friend pointed out, one of Wilson's speeches has a FK grade level of over 17, but if you replace one of the semi-colons in the speech with a period, the FK grade drops to 12. This subtlety is lost in speech format, giving FK an extremely high uncertainty (this same friend calls FK "utterly useless" for speeches).
- The audience of the SOTU address has changed. Though it's a constitutional duty of the president, the delivery as a speech is not, and it only has to be delivered to Congress. However, most modern addresses have been in the form of televised speeches, and have to be understood by a wider and less politically savvy audience.
- Cultural trends in the use of spoken and written English in America involve shorter sentences over time in general.
- In this case, a more sophisticated natural language processing analysis might reveal some interesting trends. For instance, how do wartime speeches compare to times of peace? Are there any natural categories of speeches that fall out? What are the outliers? How does this compare to polls?
In short, we have some interesting data that needs heavy qualification and critical analysis, that is just presented on a page and capped with a headline that gives an overly simplistic interpretation.
Friday, March 6, 2009
Simplicity is no substitute for correctness, but simplicity has an important role
The test of a good procedure is how well it works, not how well it is understood. -- John TukeyPerhaps I'm abusing Tukey's quote here, because I'm speaking of situations where the theory of the less understood methodology is fairly well understood, or at least fairly obvious to the statistician from previous theory. I'm also, in some cases, substituting "how correct it is" in place of "how well it works."
John Cook wrote a little the other day on this quote, and I wanted to follow up a bit more. I've run into many situations where a more understood method was preferred over one that would have, for example, cut the sample size of a clinical trial or made better use of the data that was collected. The sponsor simply wanted to go with the method that was taught in the first year statistics course because it was easier to understand. The results were often analysis plans that were less powerful, covered up important issues, or simply wrong (i.e. exact answer to the wrong question). It's a delicate balance especially for someone trained in theoretical statistics corresponding with a scientist or clinician in a very applied setting.
Here's how I resolve the issue. I think that the simpler methods are great for storytelling. I appreciate Andrew Gelman's tweaks to the simpler methods (and his useful discussing on Tukey as well!), and think basic graphing and estimation methods serve a useful purpose for presentation and first-order approximations of data analysis. But, in most practical cases, they should not be the last effort.
On a related note, I'm sure most statisticians know by know that they will have the "sexiest job" of the 2010 decade. The key will be how well we communicate our results. And here is where judicious use of the simpler methods (and creative data visualization) will make the greatest contributions.
Subscribe to:
Comments (Atom)