3

I am a cardiologist and love coding in R - i am having a real issue with sorting a data frame and i suspect the solution is really easy!

I have a data frame with summary values from multiple studies df$study. Most studies have only one summary value (df$summary). However as you can see Study A has three summary values (df$no.of.estimate). See below

study <- c("E", "A", "F", "A", "B", "A", "C", "D")
no.of.estimate <- c(1, 2, 1, 3, 1, 1, 1, 1)
summary <- c(1, 2, 3, 5, 6 ,7 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)

So i want to sort the dataframe by df$summary - which is easy. However, if each study has more than one estimate then i want to group these studies together and appear in order using the "no.of.estimates" column.

So essentially the desired output is

study <- c("E", "A", "A", "A", "F", "B", "C", "D")
no.of.estimate <- c(1, 1, 2, 3, 1, 1, 1, 1)
summary <- c(1, 7, 2, 5, 3 ,6 ,8 ,9)
df <- data.frame(study, no.of.estimate, summary)
David Arenburg
92.4k18 gold badges143 silver badges201 bronze badges
asked Jan 4, 2015 at 14:52
2
  • 1
    You must have noticed that by using cbind, have created a matrix with columns as character class. Use data.frame(study, no.of.estimate...) Commented Jan 4, 2015 at 14:55
  • 1
    You don't want to sort you whole data set by sudy and no.of.estimate rather only in case the no.of.estimate has more than one value? It seems like you overcomplicating this a bit. It seems like you could just do df[with(df, order(study, no.of.estimate)), ], though take a look on @akruns comment first. Commented Jan 4, 2015 at 15:03

2 Answers 2

2

You could try

library(dplyr)
df %>% 
 mutate(study=factor(study, levels=unique(study))) %>%
 arrange(study,no.of.estimate)
 # study no.of.estimate summary
 #1 E 1 1
 #2 A 1 7
 #3 A 2 2
 #4 A 3 5
 #5 F 1 3
 #6 B 1 6
 #7 C 1 8
 #8 D 1 9

Or a base R approach

df$study <- factor(df$study, levels=unique(df$study))
df[with(df, order(study, no.of.estimate)), ]

data

df <- structure(list(study = structure(c(5L, 1L, 6L, 1L, 2L, 1L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 2, 1, 3, 1, 1, 1, 1), summary = c(1, 
2, 3, 5, 6, 7, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")

The expected dataset is

df1 <- structure(list(study = structure(c(5L, 1L, 1L, 1L, 6L, 2L, 3L, 
4L), .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
no.of.estimate = c(1, 1, 2, 3, 1, 1, 1, 1), summary = c(1, 
7, 2, 5, 3, 6, 8, 9)), .Names = c("study", "no.of.estimate", 
"summary"), row.names = c(NA, -8L), class = "data.frame")
David Arenburg
92.4k18 gold badges143 silver badges201 bronze badges
answered Jan 4, 2015 at 15:05
0
2

Here's my data.table attempt while leaving your columns as is and creating a new index (though see my comment first). It's main advantage that you will update your data set by reference rather than creating new copies

library(data.table)
setorder(setDT(df)[, indx := .GRP, study], indx, no.of.estimate)[]
# study no.of.estimate summary indx
# 1: E 1 1 1
# 2: A 1 7 2
# 3: A 2 2 2
# 4: A 3 5 2
# 5: F 1 3 3
# 6: B 1 6 4
# 7: C 1 8 5
# 8: D 1 9 6
answered Jan 4, 2015 at 15:15
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.