0
\$\begingroup\$

The data file is here.

Here are my codes:

> ftable=read.table(file.choose())
> start.time=Sys.time()
> 1-length(which(ftable==1))/sum(ftable)
[1] 0.12
> end.time=Sys.time()
> end.time-start.time
Time difference of 0.004880905 secs

I understand that 0.00488 secs are not a lot. But I have to repeatedly do this calculation over different and larger tables. I am wondering if the function 'which' can be replaced by a more efficient one.

Thanks in advance!

Note: This piece of codes is to calculate the percentage of singletons in ftable. If there is a more efficient way, please let me know. Thank you!

asked Aug 22, 2018 at 3:12
\$\endgroup\$
2
  • \$\begingroup\$ Is dividing by sum(df) really what you want to calculate the "percentage of singletons"? Or do you rather need something along the lines of prop.table(table(unlist(df, use.names = FALSE)))? \$\endgroup\$ Commented Aug 22, 2018 at 8:24
  • 1
    \$\begingroup\$ For measuring computation times, you can wrap some code inside system.time({}). However, when measuring such short computation times like yours, that approach or yours are not good at all: instead of one execution, you want to run the code many, many times and look at the median computation time. There is a package that does that well: microbenchmark. \$\endgroup\$ Commented Aug 23, 2018 at 1:55

1 Answer 1

4
\$\begingroup\$

Try:

1-sum(ftable==1L)/sum(ftable)

Test on larger data:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
start.time=Sys.time()
1-length(which(ftable==1))/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.1981359 secs
start.time=Sys.time()
1-sum(ftable==1L)/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.06704712 secs

bechmarks:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
jz <- function() 1-length(which(ftable==1))/sum(ftable) 
minem <- function() 1-sum(ftable==1L)/sum(ftable)
br <- bench::mark(jz(), minem(), iterations = 50)
br[, 1:7]
# A tibble: 2 x 7
# expression min mean median max `itr/sec` mem_alloc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
# 1 jz() 51.2ms 53.8ms 52.6ms 66.3ms 18.6 60.1MB
# 2 minem() 37.7ms 39.9ms 38.5ms 67ms 25.1 45.8MB
# only around 36 % faster
answered Aug 22, 2018 at 6:07
\$\endgroup\$
1
  • \$\begingroup\$ as it's a percentage 1-mean(ftable==1L) will be enough \$\endgroup\$ Commented Sep 13, 2018 at 0:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.