increase speed for 'for loop'

Question 1

I have files containing data set which contain 11,000 rows. I have to run a loop through each row, this is taking me 25minutes for each file. I am using following code:

z <- read.zoo("Title.txt", tz = "")
for(i in seq_along(z[,1])) {
 if(is.na(coredata(z[i,1]))) next
 ctr <- i
 while(ctr < length(z[,1])) {
 if(abs(coredata(z[i,1])-coredata(z[ctr+1,1])) > std_d) {
 z[ctr+1,1] <- NA
 ctr <- ctr + 1
 } else {
 break
 }
 }
}

Where "Title.txt" is file containing 11,000 rows. It looks like (first five rows):

"timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
 "1" "2012-06-03 12:30:07-05" 3188 1005 3500
 "2" "2012-06-03 12:15:16-05" 3189 1005 3500
 "3" "2012-06-03 12:00:08-05" 3185 1005 3500
 "4" "2012-06-03 11:45:11-05" 3191 1005 3500
 "5" "2012-06-03 11:30:15-05" 3188 1005 3500

I wish to receive help on how should I increase the speed for this code?

Here is how the code works:

 set.seed(100)
 x=rnorm(15)
 std_d=sd(x)

afer running code. It gives this:

 m
 [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA
 [8] 0.71453271 NA NA NA NA NA 0.73984050
 [15] NA

It replaces the next element(e2) with NA if the subtraction(e1-e2) is> std_d and then checks e1 with e3 if (e1-e3) is> std_d then it replaces e3, if it was < std_d then it would check e3-e4 and so on.

Question 2

Is coredata an array lookup or a function call?

Question 3

its a function call, from the package-zoo.

Question 4

Have you profiled your code? Since you are creating new zoo objects countless times, I wouldn't be surprised if most of the time were spent in [.zoo or coredata: if it is the case, extracting the array once before the loop may speed things up.

Question 5

okay ill try that...what if this is not the case?

Question 6

Possible duplicate - stackoverflow.com/questions/10823971/clean-up-the-dataset. Also looks like a homework problem with 2 different people on same problem

Question 7

Revised: ...I wonder if this could replace those two loops? Here are the results with the test vector offered in your comments:

set.seed(100); m=rnorm(15); std_d <- sd(m)
Reduce(function(y,xx){ if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {c(y,NA)}else{ 
 c(y,xx)} }, 
 m )
 [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA 0.71453271
 [9] NA NA NA NA NA 0.73984050 NA

Here's original test case with revised results. It is now comparing the last non-NA value to the current one and making NA if it exceeds the standard value. That other solution looked at the running sd() but your code did not do that so I just used a global sd value:

> z <- read.zoo(text=' "timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
 "1" "2012-06-03 12:30:07-05" 3188 1005 3500
 "2" "2012-06-03 12:15:16-05" 3189 1005 3500
 "3" "2012-06-03 12:00:08-05" 3185 1005 3500
 "4" "2012-06-03 11:45:11-05" 3191 1005 3500
 "5" "2012-06-03 11:30:15-05" 3188 1005 3500', tz = "")
 std_d <- sd(coredata(z[,1]) )
 zx <- as.numeric(coredata(z[,1]))
 coredata(z[,1]) <- Reduce(function(y,xx){ 
 if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {
 c(y,NA)} else { 
 c(y,xx)} }, 
 zx )
 z
 mesured_distance IFC_Code from_sensor_to_river_bottom
2012年06月03日 11:30:15 3188 1005 3500
2012年06月03日 11:45:11 NA 1005 3500
2012年06月03日 12:00:08 NA 1005 3500
2012年06月03日 12:15:16 3189 1005 3500
2012年06月03日 12:30:07 3188 1005 3500

Question 8

Sorry, but it doesn't. Your code detects whether there are any na values. But my code converts the z[ctr+1,1] values, which satisfy the condtion, to NA. I'll be grateful to you if you can provide something as this short for my entire code.

Question 9

set.seed(100) > m=rnorm(15) > m [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009 -0.58179068 [8] 0.71453271 -0.82525943 -0.35986213 0.08988614 0.09627446 -0.20163395 0.73984050 [15] 0.12337950 This what my code does: m [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA [8] 0.71453271 NA NA NA NA NA 0.73984050 [15] NA see this link: stackoverflow.com/questions/10823971/clean-up-the-dataset Things in code are perfectly explained there.

Question 10

@DWin-I have edited the question to explain the code in more detail.

Question 11

Sorry but the solution you provided gives me incorrect output. I tried using array instead of coredata(z[,1]), but it still takes a long time...around 12min.

Question 12

Thats great. Its working. Takes very less time. Thanks

42- 42- 2061 silver badge4 bronze badges · Accepted Answer · 2012-06-04 03:28:20Z

Revised: ...I wonder if this could replace those two loops? Here are the results with the test vector offered in your comments:

set.seed(100); m=rnorm(15); std_d <- sd(m)
Reduce(function(y,xx){ if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {c(y,NA)}else{ 
 c(y,xx)} }, 
 m )
 [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA 0.71453271
 [9] NA NA NA NA NA 0.73984050 NA

Here's original test case with revised results. It is now comparing the last non-NA value to the current one and making NA if it exceeds the standard value. That other solution looked at the running sd() but your code did not do that so I just used a global sd value:

> z <- read.zoo(text=' "timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
 "1" "2012-06-03 12:30:07-05" 3188 1005 3500
 "2" "2012-06-03 12:15:16-05" 3189 1005 3500
 "3" "2012-06-03 12:00:08-05" 3185 1005 3500
 "4" "2012-06-03 11:45:11-05" 3191 1005 3500
 "5" "2012-06-03 11:30:15-05" 3188 1005 3500', tz = "")
 std_d <- sd(coredata(z[,1]) )
 zx <- as.numeric(coredata(z[,1]))
 coredata(z[,1]) <- Reduce(function(y,xx){ 
 if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {
 c(y,NA)} else { 
 c(y,xx)} }, 
 zx )
 z
 mesured_distance IFC_Code from_sensor_to_river_bottom
2012年06月03日 11:30:15 3188 1005 3500
2012年06月03日 11:45:11 NA 1005 3500
2012年06月03日 12:00:08 NA 1005 3500
2012年06月03日 12:15:16 3189 1005 3500
2012年06月03日 12:30:07 3188 1005 3500

Sorry, but it doesn't. Your code detects whether there are any na values. But my code converts the z[ctr+1,1] values, which satisfy the condtion, to NA. I'll be grateful to you if you can provide something as this short for my entire code.
set.seed(100) > m=rnorm(15) > m [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009 -0.58179068 [8] 0.71453271 -0.82525943 -0.35986213 0.08988614 0.09627446 -0.20163395 0.73984050 [15] 0.12337950 This what my code does: m [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA [8] 0.71453271 NA NA NA NA NA 0.73984050 [15] NA see this link: stackoverflow.com/questions/10823971/clean-up-the-dataset Things in code are perfectly explained there.
@DWin-I have edited the question to explain the code in more detail.
Sorry but the solution you provided gives me incorrect output. I tried using array instead of coredata(z[,1]), but it still takes a long time...around 12min.

Stack Exchange Network

increase speed for 'for loop'

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

increase speed for 'for loop'

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions