R: Using the sort function in a dataframe based on multiple columns

Question 1

I am a cardiologist and love coding in R - i am having a real issue with sorting a data frame and i suspect the solution is really easy!

I have a data frame with summary values from multiple studies df$study. Most studies have only one summary value (df$summary). However as you can see Study A has three summary values (df$no.of.estimate). See below

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

So i want to sort the dataframe by df$summary - which is easy. However, if each study has more than one estimate then i want to group these studies together and appear in order using the "no.of.estimates" column.

So essentially the desired output is

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

Question 2

You must have noticed that by using cbind, have created a matrix with columns as character class. Use data.frame(study, no.of.estimate...)

Question 3

You don't want to sort you whole data set by sudy and no.of.estimate rather only in case the no.of.estimate has more than one value? It seems like you overcomplicating this a bit. It seems like you could just do df[with(df, order(study, no.of.estimate)), ], though take a look on @akruns comment first.

Question 4

You could try

library(dplyr)
df %>% 
 mutate(study=factor(study, levels=unique(study))) %>%
 arrange(study,no.of.estimate)
 # study no.of.estimate summary
 #1 E 1 1
 #2 A 1 7
 #3 A 2 2
 #4 A 3 5
 #5 F 1 3
 #6 B 1 6
 #7 C 1 8
 #8 D 1 9

Or a base R approach

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

Question 5

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
# study no.of.estimate summary indx
# 1: E 1 1 1
# 2: A 1 7 2
# 3: A 2 2 2
# 4: A 3 5 2
# 5: F 1 3 3
# 6: B 1 6 4
# 7: C 1 8 5
# 8: D 1 9 6

akrun akrun 890k38 gold badges589 silver badges700 bronze badges · Accepted Answer · 2015-01-04 15:05:10Z

You could try

library(dplyr)
df %>% 
 mutate(study=factor(study, levels=unique(study))) %>%
 arrange(study,no.of.estimate)
 # study no.of.estimate summary
 #1 E 1 1
 #2 A 1 7
 #3 A 2 2
 #4 A 3 5
 #5 F 1 3
 #6 B 1 6
 #7 C 1 8
 #8 D 1 9

Or a base R approach

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

CollectivesTM on Stack Overflow

R: Using the sort function in a dataframe based on multiple columns

2 Answers 2

data

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

data

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related