I have files containing data set which contain 11,000 rows. I have to run a loop through each row, this is taking me 25minutes for each file. I am using following code:
z <- read.zoo("Title.txt", tz = "")
for(i in seq_along(z[,1])) {
if(is.na(coredata(z[i,1]))) next
ctr <- i
while(ctr < length(z[,1])) {
if(abs(coredata(z[i,1])-coredata(z[ctr+1,1])) > std_d) {
z[ctr+1,1] <- NA
ctr <- ctr + 1
} else {
break
}
}
}
Where "Title.txt"
is file containing 11,000 rows. It looks like (first five rows):
"timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
"1" "2012-06-03 12:30:07-05" 3188 1005 3500
"2" "2012-06-03 12:15:16-05" 3189 1005 3500
"3" "2012-06-03 12:00:08-05" 3185 1005 3500
"4" "2012-06-03 11:45:11-05" 3191 1005 3500
"5" "2012-06-03 11:30:15-05" 3188 1005 3500
I wish to receive help on how should I increase the speed for this code?
Here is how the code works:
set.seed(100)
x=rnorm(15)
std_d=sd(x)
afer running code. It gives this:
m
[1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA
[8] 0.71453271 NA NA NA NA NA 0.73984050
[15] NA
It replaces the next element(e2) with NA if the subtraction(e1-e2) is> std_d and then checks e1 with e3 if (e1-e3) is> std_d then it replaces e3, if it was < std_d then it would check e3-e4 and so on.
1 Answer 1
Revised: ...I wonder if this could replace those two loops? Here are the results with the test vector offered in your comments:
set.seed(100); m=rnorm(15); std_d <- sd(m)
Reduce(function(y,xx){ if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {c(y,NA)}else{
c(y,xx)} },
m )
[1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA 0.71453271
[9] NA NA NA NA NA 0.73984050 NA
Here's original test case with revised results. It is now comparing the last non-NA value to the current one and making NA if it exceeds the standard value. That other solution looked at the running sd() but your code did not do that so I just used a global sd value:
> z <- read.zoo(text=' "timestamp" "mesured_distance" "IFC_Code" "from_sensor_to_river_bottom"
"1" "2012-06-03 12:30:07-05" 3188 1005 3500
"2" "2012-06-03 12:15:16-05" 3189 1005 3500
"3" "2012-06-03 12:00:08-05" 3185 1005 3500
"4" "2012-06-03 11:45:11-05" 3191 1005 3500
"5" "2012-06-03 11:30:15-05" 3188 1005 3500', tz = "")
std_d <- sd(coredata(z[,1]) )
zx <- as.numeric(coredata(z[,1]))
coredata(z[,1]) <- Reduce(function(y,xx){
if( abs(tail(y[!is.na(y)], 1) - xx) > std_d ) {
c(y,NA)} else {
c(y,xx)} },
zx )
z
mesured_distance IFC_Code from_sensor_to_river_bottom
2012年06月03日 11:30:15 3188 1005 3500
2012年06月03日 11:45:11 NA 1005 3500
2012年06月03日 12:00:08 NA 1005 3500
2012年06月03日 12:15:16 3189 1005 3500
2012年06月03日 12:30:07 3188 1005 3500
-
\$\begingroup\$ Sorry, but it doesn't. Your code detects whether there are any na values. But my code converts the z[ctr+1,1] values, which satisfy the condtion, to NA. I'll be grateful to you if you can provide something as this short for my entire code. \$\endgroup\$rockswap– rockswap2012年06月04日 03:56:10 +00:00Commented Jun 4, 2012 at 3:56
-
\$\begingroup\$ set.seed(100) > m=rnorm(15) > m [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009 -0.58179068 [8] 0.71453271 -0.82525943 -0.35986213 0.08988614 0.09627446 -0.20163395 0.73984050 [15] 0.12337950 This what my code does: m [1] -0.50219235 NA -0.07891709 NA 0.11697127 0.31863009 NA [8] 0.71453271 NA NA NA NA NA 0.73984050 [15] NA see this link: stackoverflow.com/questions/10823971/clean-up-the-dataset Things in code are perfectly explained there. \$\endgroup\$rockswap– rockswap2012年06月04日 04:22:09 +00:00Commented Jun 4, 2012 at 4:22
-
\$\begingroup\$ @DWin-I have edited the question to explain the code in more detail. \$\endgroup\$rockswap– rockswap2012年06月04日 04:28:57 +00:00Commented Jun 4, 2012 at 4:28
-
\$\begingroup\$ Sorry but the solution you provided gives me incorrect output. I tried using array instead of coredata(z[,1]), but it still takes a long time...around 12min. \$\endgroup\$rockswap– rockswap2012年06月04日 04:39:09 +00:00Commented Jun 4, 2012 at 4:39
-
\$\begingroup\$ Thats great. Its working. Takes very less time. Thanks \$\endgroup\$rockswap– rockswap2012年06月04日 08:07:27 +00:00Commented Jun 4, 2012 at 8:07
coredata
an array lookup or a function call? \$\endgroup\$[.zoo
orcoredata
: if it is the case, extracting the array once before the loop may speed things up. \$\endgroup\$