2
\$\begingroup\$

I am trying to fill values based on group, in my case id. I would like to fill the missing values according to the available date info for each id.

 id date
1 1 23-04
2 1 23-04
3 1 <NA>
4 1 <NA>
5 2 24-04
6 2 <NA>
7 2 <NA>
8 2 <NA>
9 3 23-04
10 3 <NA>
11 3 <NA>
12 3 <NA>
13 4 <NA>
14 4 <NA>
15 4 <NA>
16 4 <NA>

What I need is:

 id date
1 1 23-04
2 1 23-04
3 1 23-04
4 1 23-04
5 2 24-04
6 2 24-04
7 2 24-04
8 2 24-04
9 3 23-04
10 3 23-04
11 3 23-04
12 3 23-04
13 4 <NA>
14 4 <NA>
15 4 <NA>
16 4 <NA>

I figured out a loop, but I would like to avoid it because my data has 23 millions rows:

for(i in 2:nrow(dta)){
 if(dta$id[i-1] == dta$id[i])
 {
 dta$date[i] = dta$date[i-1]
 }
}

I cannot figure out how to translate this into dplyr syntax:

dta = structure(list(id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"
), class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, NA, NA, NA, NA), .Label = c("23-04", 
"24-04"), class = "factor")), .Names = c("id", "date"), row.names = c(NA, 
-16L), class = "data.frame")
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 27, 2015 at 14:45
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

In R this is usually solved using the na.locf (Last Observation Carried Forward) function from the zoo package.

See also here:


# test data
x <- read.table(text="id;date
1;23-04
1;23-04
1;NA
1;NA
2;24-04
2;NA
2;NA
2;NA
3;23-04
3;NA
3;NA
3;NA
4;NA
4;NA
4;NA
4;NA", header=TRUE, sep=";")
library("dplyr")
library("zoo")
x %>% group_by(id) %>% transmute(date=na.locf(date, na.rm=FALSE))
Source: local data frame [16 x 2]
Groups: id
 id date
1 1 23-04
2 1 23-04
3 1 23-04
4 1 23-04
5 2 24-04
6 2 24-04
7 2 24-04
8 2 24-04
9 3 23-04
10 3 23-04
11 3 23-04
12 3 23-04
13 4 NA
14 4 NA
15 4 NA
16 4 NA

Another option are rolling self-joins supported by the data.table package (see here).

answered Jul 27, 2015 at 15:42
\$\endgroup\$
2
  • \$\begingroup\$ interesting I will try it on the whole sample \$\endgroup\$ Commented Jul 27, 2015 at 15:52
  • \$\begingroup\$ It would be great to see how we can do it with data.table and nomatch=roll or roll=Inf. I'm not able to find any example. \$\endgroup\$ Commented Jul 11, 2017 at 10:03
1
\$\begingroup\$

There is no need for packages. This is easily and efficiently done with basic syntax.

# Create data.table
dtk <- data.table( var=c(NA,NA,1,NA,NA,2,3,NA,NA,NA,NA,4,5,NA,NA,NA,NA,NA,6,7,NA,NA), group=c("A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B"))
# Backward fill
dtk[,order:= (1:.N)];
setorder(dtk,-order);
dtk[, var_backward_fill := var[1], by= .( group , cumsum(!is.na(var))) ];
setorder(dtk, order);
dtk[,order:= NULL];
# Forward fill
dtk[, var_forward_fill := var[1], by= .( group , cumsum(!is.na(var))) ]
Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
answered May 24, 2019 at 15:17
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.