The data file is here.
Here are my codes:
> ftable=read.table(file.choose())
> start.time=Sys.time()
> 1-length(which(ftable==1))/sum(ftable)
[1] 0.12
> end.time=Sys.time()
> end.time-start.time
Time difference of 0.004880905 secs
I understand that 0.00488 secs are not a lot. But I have to repeatedly do this calculation over different and larger tables. I am wondering if the function 'which
' can be replaced by a more efficient one.
Thanks in advance!
Note: This piece of codes is to calculate the percentage of singletons in ftable. If there is a more efficient way, please let me know. Thank you!
1 Answer 1
Try:
1-sum(ftable==1L)/sum(ftable)
Test on larger data:
n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
start.time=Sys.time()
1-length(which(ftable==1))/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.1981359 secs
start.time=Sys.time()
1-sum(ftable==1L)/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.06704712 secs
bechmarks:
n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
jz <- function() 1-length(which(ftable==1))/sum(ftable)
minem <- function() 1-sum(ftable==1L)/sum(ftable)
br <- bench::mark(jz(), minem(), iterations = 50)
br[, 1:7]
# A tibble: 2 x 7
# expression min mean median max `itr/sec` mem_alloc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
# 1 jz() 51.2ms 53.8ms 52.6ms 66.3ms 18.6 60.1MB
# 2 minem() 37.7ms 39.9ms 38.5ms 67ms 25.1 45.8MB
# only around 36 % faster
-
\$\begingroup\$ as it's a percentage
1-mean(ftable==1L)
will be enough \$\endgroup\$moodymudskipper– moodymudskipper2018年09月13日 00:49:03 +00:00Commented Sep 13, 2018 at 0:49
sum(df)
really what you want to calculate the "percentage of singletons"? Or do you rather need something along the lines ofprop.table(table(unlist(df, use.names = FALSE)))
? \$\endgroup\$system.time({})
. However, when measuring such short computation times like yours, that approach or yours are not good at all: instead of one execution, you want to run the code many, many times and look at the median computation time. There is a package that does that well:microbenchmark
. \$\endgroup\$