Is there a more efficient way of doing function 'which' in R?

Question 1

The data file is here.

Here are my codes:

> ftable=read.table(file.choose())
> start.time=Sys.time()
> 1-length(which(ftable==1))/sum(ftable)
[1] 0.12
> end.time=Sys.time()
> end.time-start.time
Time difference of 0.004880905 secs

I understand that 0.00488 secs are not a lot. But I have to repeatedly do this calculation over different and larger tables. I am wondering if the function 'which' can be replaced by a more efficient one.

Thanks in advance!

Note: This piece of codes is to calculate the percentage of singletons in ftable. If there is a more efficient way, please let me know. Thank you!

Question 2

Is dividing by sum(df) really what you want to calculate the "percentage of singletons"? Or do you rather need something along the lines of prop.table(table(unlist(df, use.names = FALSE)))?

Question 3

For measuring computation times, you can wrap some code inside system.time({}). However, when measuring such short computation times like yours, that approach or yours are not good at all: instead of one execution, you want to run the code many, many times and look at the median computation time. There is a package that does that well: microbenchmark.

Question 4

Try:

1-sum(ftable==1L)/sum(ftable)

Test on larger data:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
start.time=Sys.time()
1-length(which(ftable==1))/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.1981359 secs
start.time=Sys.time()
1-sum(ftable==1L)/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.06704712 secs

bechmarks:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
jz <- function() 1-length(which(ftable==1))/sum(ftable) 
minem <- function() 1-sum(ftable==1L)/sum(ftable)
br <- bench::mark(jz(), minem(), iterations = 50)
br[, 1:7]
# A tibble: 2 x 7
# expression min mean median max `itr/sec` mem_alloc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
# 1 jz() 51.2ms 53.8ms 52.6ms 66.3ms 18.6 60.1MB
# 2 minem() 37.7ms 39.9ms 38.5ms 67ms 25.1 45.8MB
# only around 36 % faster

Question 5

as it's a percentage 1-mean(ftable==1L) will be enough

minem minemminem 9921 gold badge8 silver badges12 bronze badges · Accepted Answer · 2018-08-22 06:07:31Z

Try:

1-sum(ftable==1L)/sum(ftable)

Test on larger data:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
start.time=Sys.time()
1-length(which(ftable==1))/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.1981359 secs
start.time=Sys.time()
1-sum(ftable==1L)/sum(ftable)
end.time=Sys.time()
end.time-start.time
# Time difference of 0.06704712 secs

bechmarks:

n <- 1000000
set.seed(21)
ftable <- data.frame(replicate(3, sample.int(4, n, replace = T))-1L)
jz <- function() 1-length(which(ftable==1))/sum(ftable) 
minem <- function() 1-sum(ftable==1L)/sum(ftable)
br <- bench::mark(jz(), minem(), iterations = 50)
br[, 1:7]
# A tibble: 2 x 7
# expression min mean median max `itr/sec` mem_alloc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
# 1 jz() 51.2ms 53.8ms 52.6ms 66.3ms 18.6 60.1MB
# 2 minem() 37.7ms 39.9ms 38.5ms 67ms 25.1 45.8MB
# only around 36 % faster

\$\begingroup\$ as it's a percentage 1-mean(ftable==1L) will be enough \$\endgroup\$

moodymudskipper
– moodymudskipper

2018年09月13日 00:49:03 +00:00
Commented Sep 13, 2018 at 0:49

Stack Exchange Network

Is there a more efficient way of doing function 'which' in R?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is there a more efficient way of doing function 'which' in R?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions