33
$\begingroup$

You can have data in wide format or in long format. This is quite an important thing, as the useable methods are different, depending on the format. I know you have to work with melt() and cast() from the reshape package, but there seems some things that I don't get.

Can someone give me a short overview how you do this?

csgillespie
13k9 gold badges63 silver badges91 bronze badges
asked Feb 21, 2011 at 10:27
$\endgroup$
4
  • $\begingroup$ Please provide the example of what you want to achieve. What exactly you do not get? $\endgroup$ Commented Feb 21, 2011 at 10:32
  • 3
    $\begingroup$ Here is my blog post with example of using melt and cast. There conversion from wide to long format is done at one stage. There really isn't anything more special. $\endgroup$ Commented Feb 21, 2011 at 10:35
  • $\begingroup$ Welcome to stats. You might find it helps to include a small, reproducible dataset in your question to explain what you want. Read sigmafield.org/2011/01/18/… for more. $\endgroup$ Commented Feb 21, 2011 at 12:45
  • $\begingroup$ See this SO question for many ways to do this. $\endgroup$ Commented Mar 9, 2018 at 10:57

5 Answers 5

29
$\begingroup$

There are several resources on Hadley Wickham's website for the package (now called reshape2), including a link to a paper on the package in the Journal of Statistical Software.

Here is a brief example from the paper:

> require(reshape2)
Loading required package: reshape2
> data(smiths)
> smiths
 subject time age weight height
1 John Smith 1 33 90 1.87
2 Mary Smith 1 NA NA 1.54

We note that the data are in the wide form. To go to the long form, we make the smiths data frame molten:

> melt(smiths)
Using subject as id variables
 subject variable value
1 John Smith time 1.00
2 Mary Smith time 1.00
3 John Smith age 33.00
4 Mary Smith age NA
5 John Smith weight 90.00
6 Mary Smith weight NA
7 John Smith height 1.87
8 Mary Smith height 1.54

Notice how melt() chose one of the variables as the id, but we can state explicitly which to use via argument 'id':

> melt(smiths, id = "subject")
 subject variable value
1 John Smith time 1.00
2 Mary Smith time 1.00
3 John Smith age 33.00
4 Mary Smith age NA
5 John Smith weight 90.00
6 Mary Smith weight NA
7 John Smith height 1.87
8 Mary Smith height 1.54

Here is another example from ?cast:

#Air quality example
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)

If we store the molten data frame, we can cast into other forms. In the new version of reshape (called reshape2) there are functions acast() and dcast() returning an array-like (array, matrix, vector) result or a data frame respectively. These functions also take an aggregating function (eg mean()) to provide summaries of data in molten form. For example, following on from the Air Quality example above, we can generate, in wide form, monthly mean values for the variables in the data set:

> dcast(aqm, month ~ variable, mean)
 month ozone solar.r wind temp
1 5 23.61538 181.2963 11.622581 65.54839
2 6 29.44444 190.1667 10.266667 79.10000
3 7 59.11538 216.4839 8.941935 83.90323
4 8 59.96154 171.8571 8.793548 83.96774
5 9 31.44828 167.4333 10.180000 76.90000

There are really only two main functions in reshape2: melt() and the acast() and dcast() pairing. Look at the examples in the help pages for these two functions, see Hadley's website (link above) and look at the paper I mentioned. That should get you started.

You might also look into Hadley's plyr package which does similar things to reshape2 but is designed to do a whole lot more besides.

answered Feb 21, 2011 at 11:09
$\endgroup$
2
  • $\begingroup$ dcast(aqm, month ~ variable), what would this do without the aggregating function? $\endgroup$ Commented May 1, 2013 at 12:50
  • $\begingroup$ @CravingSpirit it would return the number of observations for each variable. Read ?dcast which would have told you this (see the details for argument fun.aggregate). $\endgroup$ Commented May 1, 2013 at 17:03
9
$\begingroup$
  • Quick-R has simple example of using reshape package

  • See also ?reshape (LINK) for the Base R way of moving between wide and long format.

answered Feb 21, 2011 at 11:14
$\endgroup$
8
$\begingroup$

You don't have to use melt and cast.

Reshaping data can be done lots of ways. In your particular example on your cite using recast with aggregate was redundant because aggregate does the task fine all on it's own.

aggregate(cbind(LPMVTUZ, LPMVTVC, LPMVTXC) ~ year, dtm, sum)
# or even briefer by first removing the columns you don't want to use
aggregate(. ~ year, dtm[,-2], sum)

I do like how, in your blog post, you explain what melt is doing. Very few people understand that and once you see it then it gets easier to see how cast works and how you might write your own functions if you want.

answered Feb 21, 2011 at 15:25
$\endgroup$
3
$\begingroup$

See the reshape2 wiki. It surely provides more examples as you could expect.

answered May 16, 2013 at 17:29
$\endgroup$
2
$\begingroup$

Just noticing there's no reference to the more efficient and extensive reshaping methods in data.table here, so I am posting without further comment the excellent answer by Zach/Arun on StackOverflow for a similar question:

https://stackoverflow.com/questions/6902087/proper-fastest-way-to-reshape-a-data-table/6913151#6913151

And in particular there's the wonderful vignette on the data.table GitHub page:

https://github.com/Rdatatable/data.table/wiki/Getting-started

answered May 10, 2016 at 18:01
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.