Realizations in Biostatistics: practice of statistics

Showing posts with label practice of statistics. Show all posts

Wednesday, November 25, 2015

Even the tiniest error messages can indicate an invalid statistical analysis

The other day, I was reading in a data set in R, and the function indicated that there was a warning about a parsing error on one line. I went ahead with the analysis anyway, but that small parsing error kept bothering me. I thought it was just one line of goofed up data, or perhaps a quote in the wrong place. I finally opened up the CSV file in a text editor, and found that the reason for the parsing error was that the data set was duplicated within the CSV file. The parsing error resulted from the reading of the header twice. As a result, anything I did afterward was suspect.

Word to the wise: track down the reasons for even the most innocuous-seeming warnings. Every stage of a statistical analysis is important, and small errors anywhere along the way and have huge consequences downstream. Perhaps this is obvious, but you still have to slow down and take care of the details.

(Note that I'm editing this to be a part of my Little Debate series, which discusses the tiny decisions dealing with data that are rarely discussed or scrutinized, but can have a major impact on conclusions.)

Posted by Unknown at 8:00 AM

Labels: Little Debate, practice of statistics, R, statistics

Wednesday, August 7, 2013

Joint statistical meetings 2013

Every year, the first week of August, we statisticians meet to get our statistics, networking, dancing, and beer on. With thousands in attendance, it's exhausting. I wonder about the quality of statistical work the second week of August.

Each conference seems to have a life of its own, so I tend to reflect on each one. Here's my reflection on this year's:

First, being in Montreal, most of us couldn't use smartphones. Thankfully, Revolution Analytics sponsored free WiFi. They also do great work with R. So we were all for the most part able to tweet.

The quality of talks was pretty good this year, and I've learned a lot. We even had one person describe simulations with a flowchart rather than indecipherable equations, and I strongly encourage that practice.

As a member of the biopharmaceutical section, I was struck by how few people take advantage of our awards. Of course, everybody giving a contributed or topic contributed talks is automatically entered into the best contributed paper competition. But we have a poster competition and student paper competition that have to be explicitly entered, and participation is low. This is a great opportunity.

The highlight of the conference, of course, was Nate Silver's talk, and he delivered admirably. The perhaps thousand statisticians in attendance needed the message: learn to communicate with journalists and teach them numbers need context. I also like his response to the question "statistician or data scientist?" Which was, of course, "I don't care what you call yourself, just do good work."

Posted by Unknown at 12:02 PM

Labels: joint statistical meetings, practice of statistics, statistics

Tuesday, March 12, 2013

Distrust of R

I guess I've been living in a bubble for a bit, but apparently there are a lot of people who still mistrust R. I got asked this week why I used R (and, specifically, the package rpart) to generate classification and regression trees instead of SAS Enterprise Miner. Never mind the fact that rpart code has been around a very long time, and probably has been subject to more scrutiny than any other decision tree code. (And never mind the fact that I really don't like classification and regression trees in general because of their limitations.)

At any rate, if someone wants to pay the big bucks for me to use SAS Enterprise Miner just on their project, they can go right ahead. Otherwise, I have got a bit of convincing to do.

Posted by Unknown at 7:12 PM

Labels: practice of statistics, R