How can I simplify the code using just the tidyverse?

Question 1

So I have the data frame below, and I'm trying to creat a new Data frame which show me the name of the department which have the highest and the lowest employee turnover. The code that I write is correct, but is too big, so I am wondering how can I simplify it. Thanks

My data:

df = data.frame(
 department = c("admin", "engineering", "finance", "IT", "logistics", "marketing", "operations", "retail", "sales", "support", "admin", "engineering", "finance", "IT", "logistics", "marketing", "admin", "retail", "admin", "engineering"),
 promoted = c(0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1),
 review = c(0.4, 0.5, 0.4, 0.8, 0.4, 0.1, 0.9, 0.2, 0.1, 0.1, 0.1, 0.7, 0.1, 0.55, 0.4, 0.33, 0.11, 0.1, 0.11, 0.1),
 projects = c(1, 2, 1, 3, 4, 1, 5, 0, 1, 1, 2, 1, 3, 4, 1, 5, 0, 1, 0, 1),
 salary = c("low", "medium", "high", "low", "medium", "low", "medium", "low", "low", "low", "medium", "high", "low", "medium", "low", "medium", "low", "low", "low", "medium"),
 tenure = c(1, 2, 1, 3, 4, 1, 5, 0, 1, 1, 2, 1, 3, 4, 1, 5, 0, 1, 0, 1),
 satisfaction = c(0.4, 0.5, 0.4, 0.8, 0.4, 0.1, 0.9, 0.2, 0.1, 0.1, 0.1, 0.7, 0.1, 0.55, 0.4, 0.33, 0.11, 0.1, 0.11, 0.1),
 bonus = c(0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1),
 left = c("yes", "no", "yes", "no", "no", "no", "yes", "yes", "no", "yes", "no", "yes", "no", "no", "no", "yes", "yes", "no", "yes", "no"))

My code:

library(tidyverse)
df1 <- df%>%
 count(department)
colnames(df1) <- c('department','Total')
df2<- df%>%
 filter(left == "yes")%>%
 count(department)
colnames(df2) <- c('department','Yes')
df2$Yes<-as.numeric(df2$Yes)
df1$Total<-as.numeric(df1$Total)
df3 <- inner_join(df1, df2)
head(df3, 10)
 
df3Max <-df3%>%
 mutate( turnover = Yes/Total ) %>%
 arrange(desc(turnover))
df3Max <- head(df3Max, 1)
df3Min <-df3%>%
 mutate( turnover = Yes/Total ) %>%
 arrange(turnover)
df3Min <- head(df3Min, 1)
Turnover <- rbind(df3Max, df3Min)
```

Question 2

At the moment your code removes any departments with no turnover, so it's returning the department with the smallest non-zero turnover. Is this the intended behavior?

Question 3

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. The site standard is for the title to simply state the task accomplished by the code. Please see How do I ask a good question?.

Question 4

Right now your code separately computes the total number in each department (df1) and the number who left (df2) and then joins these results to get department summaries (df3). However, this would be more efficient as a grouped operation:

df %>%
 group_by(department) %>%
 summarize(Total=n(), Yes=sum(left=="yes"))
# # A tibble: 10 ×ばつ 3
# department Total Yes
# <fct> <int> <int>
# 1 admin 4 3
# 2 engineering 3 1
# 3 finance 2 1
# 4 IT 2 0
# 5 logistics 2 0
# 6 marketing 2 1
# 7 operations 1 1
# 8 retail 2 1
# 9 sales 1 0
# 10 support 1 1

Beyond being more compact code, this helps you not have to think about various details (e.g. inner join versus outer join when combining df1 and df2).

Now you basically just need to create your turnover variable, order by turnover, and grab the top and bottom. It turns out grabbing the top and bottom can be efficiently handled with slice (see more here), which prevents you from needing to separately grab the top (df3Max) and bottom (df3Min) and then combine:

df %>%
 group_by(department) %>%
 summarize(Total=n(), Yes=sum(left=="yes")) %>%
 ungroup() %>%
 mutate(turnover=Yes/Total) %>%
 arrange(turnover) %>%
 slice(c(1, n()))
# # A tibble: 2 ×ばつ 4
# department Total Yes turnover
# <fct> <int> <int> <dbl>
# 1 IT 2 0 0
# 2 support 1 1 1

Note that grabbing the top and bottom in this way also avoids code repetition in defining turnover and in sorting.

Your code in the question removes any department with no turnover due to its use of an inner join between df1 and df2. If that's the desired behavior, then you can just add in a filter(Yes > 0) to replicate that behavior.

josliber josliber 1,2219 silver badges17 bronze badges · Answer 1 · 2022-02-09 17:09:55Z

Right now your code separately computes the total number in each department (df1) and the number who left (df2) and then joins these results to get department summaries (df3). However, this would be more efficient as a grouped operation:

df %>%
 group_by(department) %>%
 summarize(Total=n(), Yes=sum(left=="yes"))
# # A tibble: 10 ×ばつ 3
# department Total Yes
# <fct> <int> <int>
# 1 admin 4 3
# 2 engineering 3 1
# 3 finance 2 1
# 4 IT 2 0
# 5 logistics 2 0
# 6 marketing 2 1
# 7 operations 1 1
# 8 retail 2 1
# 9 sales 1 0
# 10 support 1 1

Beyond being more compact code, this helps you not have to think about various details (e.g. inner join versus outer join when combining df1 and df2).

Now you basically just need to create your turnover variable, order by turnover, and grab the top and bottom. It turns out grabbing the top and bottom can be efficiently handled with slice (see more here), which prevents you from needing to separately grab the top (df3Max) and bottom (df3Min) and then combine:

df %>%
 group_by(department) %>%
 summarize(Total=n(), Yes=sum(left=="yes")) %>%
 ungroup() %>%
 mutate(turnover=Yes/Total) %>%
 arrange(turnover) %>%
 slice(c(1, n()))
# # A tibble: 2 ×ばつ 4
# department Total Yes turnover
# <fct> <int> <int> <dbl>
# 1 IT 2 0 0
# 2 support 1 1 1

Note that grabbing the top and bottom in this way also avoids code repetition in defining turnover and in sorting.

Your code in the question removes any department with no turnover due to its use of an inner join between df1 and df2. If that's the desired behavior, then you can just add in a filter(Yes > 0) to replicate that behavior.

Stack Exchange Network

How can I simplify the code using just the tidyverse?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I simplify the code using just the tidyverse?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions