I want to sort a data frame by multiple columns. For example, with the data frame below I would like to sort by column 'z' (descending) then by column 'b' (ascending):
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
dd
b x y z
1 Hi A 8 1
2 Med D 3 1
3 Hi A 9 1
4 Low C 9 2
21 Answers 21
You can use the order()
function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order)
code:
R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order()
function:
R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>
rather than using the name of the column (and with()
for easier/more direct access).
-
17Should work the same way, but you can't use
with
. TryM <- matrix(c(1,2,2,2,3,6,4,5), 4, 2, byrow=FALSE, dimnames=list(NULL, c("a","b")))
to create a matrixM
, then useM[order(M[,"a"],-M[,"b"]),]
to order it on two columns.Dirk is no longer here– Dirk is no longer here2012年03月27日 12:41:29 +00:00Commented Mar 27, 2012 at 12:41 -
8Easy enough:
dd[ order(-dd[,4], dd[,1]), ]
, but can't usewith
for name-based subsetting.Dirk is no longer here– Dirk is no longer here2012年10月21日 14:34:07 +00:00Commented Oct 21, 2012 at 14:34 -
why is
dd[ order(-dd[,4],, ]
not valid or 'dd[ order(-dd[,4], ]' basically why isdd[,1]
required? is-dd[,4]
not enough if you just want to sort by 1 column?HattrickNZ– HattrickNZ2014年07月30日 22:11:19 +00:00Commented Jul 30, 2014 at 22:11 -
30The "invalid argument to unary operator" error occurs when you use minus with a character column. Solve it by wrapping the column in
xtfrm
, for exampledd[ order(-xtfrm(dd[,4]), dd[,1]), ]
.Richie Cotton– Richie Cotton2015年03月24日 11:40:45 +00:00Commented Mar 24, 2015 at 11:40
Your choices
order
frombase
arrange
fromdplyr
setorder
andsetorderv
fromdata.table
arrange
fromplyr
sort
fromtaRifx
orderBy
fromdoBy
sortData
fromDeducer
Most of the time you should use the dplyr
or data.table
solutions, unless having no-dependencies is important, in which case use base::order
.
I recently added sort.data.frame to a CRAN package, making it class compatible as discussed here: Best way to create generic/method consistency for sort.data.frame?
Therefore, given the data.frame dd, you can sort as follows:
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(taRifx)
sort(dd, f= ~ -z + b )
If you are one of the original authors of this function, please contact me. Discussion as to public domaininess is here: https://chat.stackoverflow.com/transcript/message/1094290#1094290
You can also use the arrange()
function from plyr
as Hadley pointed out in the above thread:
library(plyr)
arrange(dd,desc(z),b)
Benchmarks: Note that I loaded each package in a new R session since there were a lot of conflicts. In particular loading the doBy package causes sort
to return "The following object(s) are masked from 'x (position 17)': b, x, y, z", and loading the Deducer package overwrites sort.data.frame
from Kevin Wright or the taRifx package.
#Load each time
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
library(microbenchmark)
# Reload R between benchmarks
microbenchmark(dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
Median times:
dd[with(dd, order(-z, b)), ]
778
dd[order(-dd$z, dd$b),]
788
library(taRifx)
microbenchmark(sort(dd, f= ~-z+b ),times=1000)
Median time: 1,567
library(plyr)
microbenchmark(arrange(dd,desc(z),b),times=1000)
Median time: 862
library(doBy)
microbenchmark(orderBy(~-z+b, data=dd),times=1000)
Median time: 1,694
Note that doBy takes a good bit of time to load the package.
library(Deducer)
microbenchmark(sortData(dd,c("z","b"),increasing= c(FALSE,TRUE)),times=1000)
Couldn't make Deducer load. Needs JGR console.
esort <- function(x, sortvar, ...) {
attach(x)
x <- x[with(x,order(sortvar,...)),]
return(x)
detach(x)
}
microbenchmark(esort(dd, -z, b),times=1000)
Doesn't appear to be compatible with microbenchmark due to the attach/detach.
m <- microbenchmark(
arrange(dd,desc(z),b),
sort(dd, f= ~-z+b ),
dd[with(dd, order(-z, b)), ] ,
dd[order(-dd$z, dd$b),],
times=1000
)
uq <- function(x) { fivenum(x)[4]}
lq <- function(x) { fivenum(x)[2]}
y_min <- 0 # min(by(m$time,m$expr,lq))
y_max <- max(by(m$time,m$expr,uq)) * 1.05
p <- ggplot(m,aes(x=expr,y=time)) + coord_cartesian(ylim = c( y_min , y_max ))
p + stat_summary(fun.y=median,fun.ymin = lq, fun.ymax = uq, aes(fill=expr))
(lines extend from lower quartile to upper quartile, dot is the median)
Given these results and weighing simplicity vs. speed, I'd have to give the nod to arrange
in the plyr
package. It has a simple syntax and yet is almost as speedy as the base R commands with their convoluted machinations. Typically brilliant Hadley Wickham work. My only gripe with it is that it breaks the standard R nomenclature where sorting objects get called by sort(object)
, but I understand why Hadley did it that way due to issues discussed in the question linked above.
-
5The ggplot2 microbenchmark function above is now available as
taRifx::autoplot.microbenchmark
.Ari B. Friedman– Ari B. Friedman2012年06月01日 01:23:01 +00:00Commented Jun 1, 2012 at 1:23 -
@AriB.Friedman using 'arrange', how do we sort by ascending? I never see examples sorting in ascending order. I tried 'asc' instead of 'desc' and it doesn't work. thanksAME– AME2013年10月12日 06:37:58 +00:00Commented Oct 12, 2013 at 6:37
-
4@AME look at how
b
is sorted in the sample. The default is sort by ascending, so you just don't wrap it indesc
. Ascending in both:arrange(dd,z,b)
. Descending in both:arrange(dd,desc(z),desc(b))
.Ari B. Friedman– Ari B. Friedman2013年10月12日 10:16:56 +00:00Commented Oct 12, 2013 at 10:16 -
4As per
?arrange
: "# NOTE: plyr functions do NOT preserve row.names". This makes the excellentarrange()
function suboptimal if one wants to keeprow.names
.landroni– landroni2014年03月10日 16:31:33 +00:00Commented Mar 10, 2014 at 16:31 -
Some of these that use
order
might be a bit faster if you usesort.list(x, method="radix")
instead.Ari B. Friedman– Ari B. Friedman2015年07月02日 11:00:45 +00:00Commented Jul 2, 2015 at 11:00
Dirk's answer is great. It also highlights a key difference in the syntax used for indexing data.frame
s and data.table
s:
## The data.frame way
dd[with(dd, order(-z, b)), ]
## The data.table way: (7 fewer characters, but that's not the important bit)
dd[order(-z, b)]
The difference between the two calls is small, but it can have important consequences. Especially if you write production code and/or are concerned with correctness in your research, it's best to avoid unnecessary repetition of variable names. data.table
helps you do this.
Here's an example of how repetition of variable names might get you into trouble:
Let's change the context from Dirk's answer, and say this is part of a bigger project where there are a lot of object names and they are long and meaningful; instead of dd
it's called quarterlyreport
. It becomes :
quarterlyreport[with(quarterlyreport,order(-z,b)),]
Ok, fine. Nothing wrong with that. Next your boss asks you to include last quarter's report in the report. You go through your code, adding an object lastquarterlyreport
in various places and somehow (how on earth?) you end up with this :
quarterlyreport[with(lastquarterlyreport,order(-z,b)),]
That isn't what you meant but you didn't spot it because you did it fast and it's nestled on a page of similar code. The code doesn't fall over (no warning and no error) because R thinks it is what you meant. You'd hope whoever reads your report spots it, but maybe they don't. If you work with programming languages a lot then this situation may be all to familiar. It was a "typo" you'll say. I'll fix the "typo" you'll say to your boss.
In data.table
we're concerned about tiny details like this. So we've done something simple to avoid typing variable names twice. Something very simple. i
is evaluated within the frame of dd
already, automatically. You don't need with()
at all.
Instead of
dd[with(dd, order(-z, b)), ]
it's just
dd[order(-z, b)]
And instead of
quarterlyreport[with(lastquarterlyreport,order(-z,b)),]
it's just
quarterlyreport[order(-z,b)]
It's a very small difference, but it might just save your neck one day. When weighing up the different answers to this question, consider counting the repetitions of variable names as one of your criteria in deciding. Some answers have quite a few repeats, others have none.
-
11+1 This is a great point, and gets at a detail of R's syntax that has often irritated me. I sometimes use
subset()
just to avoid having to repeatedly refer to the same object within a single call.Josh O'Brien– Josh O'Brien2012年05月25日 20:45:05 +00:00Commented May 25, 2012 at 20:45 -
8I guess you could add the new
setorder
function too here, as this thread is where we send all theorder
type dupes.David Arenburg– David Arenburg2015年01月08日 19:18:19 +00:00Commented Jan 8, 2015 at 19:18
There are a lot of excellent answers here, but dplyr gives the only syntax that I can quickly and easily remember (and so now use very often):
library(dplyr)
# sort mtcars by mpg, ascending... use desc(mpg) for descending
arrange(mtcars, mpg)
# sort mtcars first by mpg, then by cyl, then by wt)
arrange(mtcars , mpg, cyl, wt)
For the OP's problem:
arrange(dd, desc(z), b)
b x y z
1 Low C 9 2
2 Med D 3 1
3 Hi A 8 1
4 Hi A 9 1
-
5The accepted answer does not work when my columns are or type factor (or something like that) and I want to sort in descending fashion for this factor column followed by integer column in ascending fashion. But this works just fine! Thank you!Saheel Godhane– Saheel Godhane2014年02月22日 18:36:52 +00:00Commented Feb 22, 2014 at 18:36
-
15Why "only"? I find data.table's
dd[order(-z, b)]
pretty easy to use and remember.Matt Dowle– Matt Dowle2014年03月19日 11:11:38 +00:00Commented Mar 19, 2014 at 11:11 -
3Agreed, there's not much between those two methods, and
data.table
is a huge contribution toR
in many other ways also. I suppose for me, it might be that having one less set of brackets (or one less type of brackets) in this instance reduces the cognitive load by a just barely perceivable amount.Ben– Ben2014年03月19日 17:13:59 +00:00Commented Mar 19, 2014 at 17:13 -
12For me it comes down to the fact that
arrange()
is completely declarative,dd[order(-z, b)]
is not.Mullefa– Mullefa2015年05月29日 13:12:07 +00:00Commented May 29, 2015 at 13:12
The R package data.table
provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder()
since then. From v1.9.5+
, setorder()
also works with data.frames.
First, we'll create a dataset big enough and benchmark the different methods mentioned from other answers and then list the features of data.table.
Data:
require(plyr)
require(doBy)
require(data.table)
require(dplyr)
require(taRifx)
set.seed(45L)
dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)),
x = sample(c("A", "D", "C"), 1e8, TRUE),
y = sample(100, 1e8, TRUE),
z = sample(5, 1e8, TRUE),
stringsAsFactors = FALSE)
Benchmarks:
The timings reported are from running system.time(...)
on these functions shown below. The timings are tabulated below (in the order of slowest to fastest).
orderBy( ~ -z + b, data = dat) ## doBy
plyr::arrange(dat, desc(z), b) ## plyr
arrange(dat, desc(z), b) ## dplyr
sort(dat, f = ~ -z + b) ## taRifx
dat[with(dat, order(-z, b)), ] ## base R
# convert to data.table, by reference
setDT(dat)
dat[order(-z, b)] ## data.table, base R like syntax
setorder(dat, -z, b) ## data.table, using setorder()
## setorder() now also works with data.frames
# R-session memory usage (BEFORE) = ~2GB (size of 'dat')
# ------------------------------------------------------------
# Package function Time (s) Peak memory Memory used
# ------------------------------------------------------------
# doBy orderBy 409.7 6.7 GB 4.7 GB
# taRifx sort 400.8 6.7 GB 4.7 GB
# plyr arrange 318.8 5.6 GB 3.6 GB
# base R order 299.0 5.6 GB 3.6 GB
# dplyr arrange 62.7 4.2 GB 2.2 GB
# ------------------------------------------------------------
# data.table order 6.2 4.2 GB 2.2 GB
# data.table setorder 4.5 2.4 GB 0.4 GB
# ------------------------------------------------------------
data.table
'sDT[order(...)]
syntax was ~10x faster than the fastest of other methods (dplyr
), while consuming the same amount of memory asdplyr
.data.table
'ssetorder()
was ~14x faster than the fastest of other methods (dplyr
), while taking just 0.4GB extra memory.dat
is now in the order we require (as it is updated by reference).
data.table features:
Speed:
data.table's ordering is extremely fast because it implements radix ordering.
The syntax
DT[order(...)]
is optimised internally to use data.table's fast ordering as well. You can keep using the familiar base R syntax but speed up the process (and use less memory).
Memory:
Most of the times, we don't require the original data.frame or data.table after reordering. That is, we usually assign the result back to the same object, for example:
DF <- DF[order(...)]
The issue is that this requires at least twice (2x) the memory of the original object. To be memory efficient, data.table therefore also provides a function
setorder()
.setorder()
reorders data.tablesby reference
(in-place), without making any additional copies. It only uses extra memory equal to the size of one column.
Other features:
It supports
integer
,logical
,numeric
,character
and evenbit64::integer64
types.Note that
factor
,Date
,POSIXct
etc.. classes are allinteger
/numeric
types underneath with additional attributes and are therefore supported as well.In base R, we can not use
-
on a character vector to sort by that column in decreasing order. Instead we have to use-xtfrm(.)
.However, in data.table, we can just do, for example,
dat[order(-x)]
orsetorder(dat, -x)
.
-
Thanks for this very instructive answer about data.table. Though, I don't understand what is "peak memory" and how you calculated it. Could you explain please ? Thank you !Julien Navarre– Julien Navarre2015年06月30日 14:32:54 +00:00Commented Jun 30, 2015 at 14:32
-
I used Instruments -> allocations and reported the "All heap and allocation VM" size.Arun– Arun2015年06月30日 14:55:01 +00:00Commented Jun 30, 2015 at 14:55
-
3@Arun the Instruments link in your comment is dead. Care to post an update?MichaelChirico– MichaelChirico2016年03月30日 15:03:24 +00:00Commented Mar 30, 2016 at 15:03
-
@MichaelChirico Here is a link to information about Instruments made by Apple: developer.apple.com/library/content/documentation/…n1k31t4– n1k31t42017年07月17日 09:25:08 +00:00Commented Jul 17, 2017 at 9:25
With this (very helpful) function by Kevin Wright, posted in the tips section of the R wiki, this is easily achieved.
sort(dd,by = ~ -z + b)
# b x y z
# 4 Low C 9 2
# 2 Med D 3 1
# 1 Hi A 8 1
# 3 Hi A 9 1
Suppose you have a data.frame
A
and you want to sort it using column called x
descending order. Call the sorted data.frame
newdata
newdata <- A[order(-A$x),]
If you want ascending order then replace "-"
with nothing. You can have something like
newdata <- A[order(-A$x, A$y, -A$z),]
where x
and z
are some columns in data.frame
A
. This means sort data.frame
A
by x
descending, y
ascending and z
descending.
or you can use package doBy
library(doBy)
dd <- orderBy(~-z+b, data=dd)
if SQL comes naturally to you, sqldf
package handles ORDER BY
as Codd intended.
-
8MJM, thanks for pointing out this package. It's incredibly flexible and because half of my work is already done by pulling from sql databases it's easier than learning much of R's less than intuitive syntax.Brandon Bertelsen– Brandon Bertelsen2010年07月29日 05:31:19 +00:00Commented Jul 29, 2010 at 5:31
Alternatively, using the package Deducer
library(Deducer)
dd<- sortData(dd,c("z","b"),increasing= c(FALSE,TRUE))
The arrange() in dplyr is my favorite option. Use the pipe operator and go from least important to most important aspect
dd1 <- dd %>%
arrange(z) %>%
arrange(desc(x))
In response to a comment added in the OP for how to sort programmatically:
Using dplyr
and data.table
library(dplyr)
library(data.table)
dplyr
Just use arrange_
, which is the Standard Evaluation version for arrange
.
df1 <- tbl_df(iris)
#using strings or formula
arrange_(df1, c('Petal.Length', 'Petal.Width'))
arrange_(df1, ~Petal.Length, ~Petal.Width)
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
(dbl) (dbl) (dbl) (dbl) (fctr)
1 4.6 3.6 1.0 0.2 setosa
2 4.3 3.0 1.1 0.1 setosa
3 5.8 4.0 1.2 0.2 setosa
4 5.0 3.2 1.2 0.2 setosa
5 4.7 3.2 1.3 0.2 setosa
6 5.4 3.9 1.3 0.4 setosa
7 5.5 3.5 1.3 0.2 setosa
8 4.4 3.0 1.3 0.2 setosa
9 5.0 3.5 1.3 0.3 setosa
10 4.5 2.3 1.3 0.3 setosa
.. ... ... ... ... ...
#Or using a variable
sortBy <- c('Petal.Length', 'Petal.Width')
arrange_(df1, .dots = sortBy)
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
(dbl) (dbl) (dbl) (dbl) (fctr)
1 4.6 3.6 1.0 0.2 setosa
2 4.3 3.0 1.1 0.1 setosa
3 5.8 4.0 1.2 0.2 setosa
4 5.0 3.2 1.2 0.2 setosa
5 4.7 3.2 1.3 0.2 setosa
6 5.5 3.5 1.3 0.2 setosa
7 4.4 3.0 1.3 0.2 setosa
8 4.4 3.2 1.3 0.2 setosa
9 5.0 3.5 1.3 0.3 setosa
10 4.5 2.3 1.3 0.3 setosa
.. ... ... ... ... ...
#Doing the same operation except sorting Petal.Length in descending order
sortByDesc <- c('desc(Petal.Length)', 'Petal.Width')
arrange_(df1, .dots = sortByDesc)
more info here: https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html
It is better to use formula as it also captures the environment to evaluate an expression in
data.table
dt1 <- data.table(iris) #not really required, as you can work directly on your data.frame
sortBy <- c('Petal.Length', 'Petal.Width')
sortType <- c(-1, 1)
setorderv(dt1, sortBy, sortType)
dt1
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 7.7 2.6 6.9 2.3 virginica
2: 7.7 2.8 6.7 2.0 virginica
3: 7.7 3.8 6.7 2.2 virginica
4: 7.6 3.0 6.6 2.1 virginica
5: 7.9 3.8 6.4 2.0 virginica
---
146: 5.4 3.9 1.3 0.4 setosa
147: 5.8 4.0 1.2 0.2 setosa
148: 5.0 3.2 1.2 0.2 setosa
149: 4.3 3.0 1.1 0.1 setosa
150: 4.6 3.6 1.0 0.2 setosa
I learned about order
with the following example which then confused me for a long time:
set.seed(1234)
ID = 1:10
Age = round(rnorm(10, 50, 1))
diag = c("Depression", "Bipolar")
Diagnosis = sample(diag, 10, replace=TRUE)
data = data.frame(ID, Age, Diagnosis)
databyAge = data[order(Age),]
databyAge
The only reason this example works is because order
is sorting by the vector Age
, not by the column named Age
in the data frame data
.
To see this create an identical data frame using read.table
with slightly different column names and without making use of any of the above vectors:
my.data <- read.table(text = '
id age diagnosis
1 49 Depression
2 50 Depression
3 51 Depression
4 48 Depression
5 50 Depression
6 51 Bipolar
7 49 Bipolar
8 49 Bipolar
9 49 Bipolar
10 49 Depression
', header = TRUE)
The above line structure for order
no longer works because there is no vector named age
:
databyage = my.data[order(age),]
The following line works because order
sorts on the column age
in my.data
.
databyage = my.data[order(my.data$age),]
I thought this was worth posting given how confused I was by this example for so long. If this post is not deemed appropriate for the thread I can remove it.
EDIT: May 13, 2014
Below is a generalized way of sorting a data frame by every column without specifying column names. The code below shows how to sort from left to right or by right to left. This works if every column is numeric. I have not tried with a character column added.
I found the do.call
code a month or two ago in an old post on a different site, but only after extensive and difficult searching. I am not sure I could relocate that post now. The present thread is the first hit for ordering a data.frame
in R
. So, I thought my expanded version of that original do.call
code might be useful.
set.seed(1234)
v1 <- c(0,0,0,0, 0,0,0,0, 1,1,1,1, 1,1,1,1)
v2 <- c(0,0,0,0, 1,1,1,1, 0,0,0,0, 1,1,1,1)
v3 <- c(0,0,1,1, 0,0,1,1, 0,0,1,1, 0,0,1,1)
v4 <- c(0,1,0,1, 0,1,0,1, 0,1,0,1, 0,1,0,1)
df.1 <- data.frame(v1, v2, v3, v4)
df.1
rdf.1 <- df.1[sample(nrow(df.1), nrow(df.1), replace = FALSE),]
rdf.1
order.rdf.1 <- rdf.1[do.call(order, as.list(rdf.1)),]
order.rdf.1
order.rdf.2 <- rdf.1[do.call(order, rev(as.list(rdf.1))),]
order.rdf.2
rdf.3 <- data.frame(rdf.1$v2, rdf.1$v4, rdf.1$v3, rdf.1$v1)
rdf.3
order.rdf.3 <- rdf.1[do.call(order, as.list(rdf.3)),]
order.rdf.3
-
5That syntax does work if you store your data in a data.table, instead of a data.frame:
require(data.table); my.dt <- data.table(my.data); my.dt[order(age)]
This works because the column names are made available inside the [] brackets.Frank– Frank2013年09月02日 19:34:01 +00:00Commented Sep 2, 2013 at 19:34 -
1I don't think the downvote is necessary here, but neither do I think this adds much to the question at hand, particularly considering the existing set of answers, some of which already capture the requirement with
data.frame
s to either usewith
or$
.A5C1D2H2I1M1N2O1R2T1– A5C1D2H2I1M1N2O1R2T12014年02月14日 11:16:52 +00:00Commented Feb 14, 2014 at 11:16 -
2upvote for
do.call
this makes short work of sorting a multicolumn data frame. Simplydo.call(sort, mydf.obj)
and a beautiful cascade sort will be had.AdamO– AdamO2016年05月25日 04:28:53 +00:00Commented May 25, 2016 at 4:28
Dirk's answer is good but if you need the sort to persist you'll want to apply the sort back onto the name of that data frame. Using the example code:
dd <- dd[with(dd, order(-z, b)), ]
Just for the sake of completeness, since not much has been said about sorting by column numbers... It can surely be argued that it is often not desirable (because the order of the columns could change, paving the way to errors), but in some specific situations (when for instance you need a quick job done and there is no such risk of columns changing orders), it might be the most sensible thing to do, especially when dealing with large numbers of columns.
In that case, do.call()
comes to the rescue:
ind <- do.call(what = "order", args = iris[,c(5,1,2,3)])
iris[ind, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 14 4.3 3.0 1.1 0.1 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## (...)
For the sake of completeness: you can also use the sortByCol()
function from the BBmisc
package:
library(BBmisc)
sortByCol(dd, c("z", "b"), asc = c(FALSE, TRUE))
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
Performance comparison:
library(microbenchmark)
microbenchmark(sortByCol(dd, c("z", "b"), asc = c(FALSE, TRUE)), times = 100000)
median 202.878
library(plyr)
microbenchmark(arrange(dd,desc(z),b),times=100000)
median 148.758
microbenchmark(dd[with(dd, order(-z, b)), ], times = 100000)
median 115.872
-
4strange to add a performance comparison when your method is the slowest... anyway dubious the value of using a benchmark on a 4-row
data.frame
MichaelChirico– MichaelChirico2016年03月30日 14:58:47 +00:00Commented Mar 30, 2016 at 14:58
Just like the mechanical card sorters of long ago, first sort by the least significant key, then the next most significant, etc. No library required, works with any number of keys and any combination of ascending and descending keys.
dd <- dd[order(dd$b, decreasing = FALSE),]
Now we're ready to do the most significant key. The sort is stable, and any ties in the most significant key have already been resolved.
dd <- dd[order(dd$z, decreasing = TRUE),]
This may not be the fastest, but it is certainly simple and reliable
For even more completeness, R 4.4.0 (see here) now includes the function sort_by()
(so has the advantage of not needing an external package):
New generic function sort_by(), primarily useful for the data.frame method which can be used to sort rows of a data frame by one or more columns.
dd |>
sort_by(~ list(-z, b))
# b x y z
# 4 Low C 9 2
# 2 Med D 3 1
# 1 Hi A 8 1
# 3 Hi A 9 1
Or:
sort_by(dd, list(-dd$z, dd$b))
Another alternative, using the rgr
package:
> library(rgr)
> gx.sort.df(dd, ~ -z+b)
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
I was struggling with the above solutions when I wanted to automate my ordering process for n columns, whose column names could be different each time. I found a super helpful function from the psych
package to do this in a straightforward manner:
dfOrder(myDf, columnIndices)
where columnIndices
are indices of one or more columns, in the order in which you want to sort them. More information here:
I would recommend using arrange
from dplyr
install.packages("dplyr")
library(dplyr)
You would want to sort by 'z' (descending) and then by 'b' (ascending)
df <- df %>%
arrange(desc(z), b)
To summarize: the rows are sorted by the z column in descending order and then rows that have the same value for z, they're again sorted by the b column in ascending order.