In this post, I asked a question regarding how to create a matrix of integers where elements in each row satisfy a particular property.
The notation in the original post is a bit cumbersome so I will try to explain the question in a nutshell. I have a vector of integers cvec
, which has length l-1
, and I enumerate all the possibilities in a matrix such that every integer in each of the l
columns is less than or equal to cvec[1]
(first three lines of code). Then I iteratively reduce that large set possibilities into a smaller matrix, such that for a given row:
- Consecutive elements taken 2,3,...l-1 at a time must be no larger than
cvec[2]
,cvec[3]
, ...,cvec[l-1]
, respectively - The total sum of the
l
elements is equal tox
.
By first subsetting mat
to only those rows with sum x
, and then running the nested for loops, I've been able to reduce the runtime by over 1/10th. For the above example, it now runs 7.8 seconds
. However, I still think this could be faster if I somehow eliminate the nested loops. Any suggestions?
valid.counts2 <- function(x,l,cvec){
#l>1
vec = seq(from=0,to=cvec[1])
lst = lapply(numeric(l), function(i) vec)
mat = as.matrix(expand.grid(lst))
mat = mat[which(rowSums(mat)==x),]
if(l>2){
#k=1 is redundant since all must be less than or equal to cvec[1]
for (k in seq(from=2, to=l-1)){
row.indx = NULL
for (r in seq(l-k+1)){
#pick out the index of the row(s) that satisfy constraint
row.indx = which(rowSums(mat[,r:(r+k-1),drop=FALSE])<=cvec[k])
#filter rows of mat
mat = mat[unique(row.indx),]
}
}
}
return(mat)
}
1 Answer 1
Instead of a top-down (very memory intensive) approach, I would recommend a somewhat bottom-up approach, where you add one column at a time then do the reductions, add another column, etc. Try this:
expand.mat <- function(mat, vec) {
out <- matrix(0L, nrow = nrow(mat) * length(vec),
ncol = ncol(mat) + 1)
for (i in 1:ncol(mat)) out[, i] <- mat[, i]
out[, ncol(mat) + 1] <- rep(vec, each = nrow(mat))
return(out)
}
valid.counts3 <- function(x, cvec) {
l <- length(cvec) + 1
vec <- seq(from = 0, to = cvec[1])
mat <- matrix(vec, ncol = 1)
for (j in 2:l) {
mat <- expand.mat(mat, vec)
rsum <- rowSums(mat)
mat <- mat[rsum <= x & rsum + cvec[1] * (l - j) >= x, ]
for (i in 1:(j-1)) {
k <- j - i + 1
if (k == l) next
rsum <- rowSums(mat[, i:j])
mat <- mat[rsum <= cvec[k], ]
}
}
return(mat)
}
system.time({
res <- valid.counts3(x = 148, cvec = c(36, 67, 92, 119))
})
# user system elapsed
# 0.279 0.036 0.319
Edit: And to handle a vector of x
as input, make these slight changes. It will return a list of matrices.
valid.counts3 <- function(x, cvec) {
l <- length(cvec) + 1
vec <- seq(from = 0, to = cvec[1])
mat <- matrix(vec, ncol = 1)
for (j in 2:l) {
mat <- expand.mat(mat, vec)
rsum <- rowSums(mat)
mat <- mat[rsum <= max(x) & rsum + cvec[1] * (l - j) >= min(x), ]
for (i in 1:(j-1)) {
k <- j - i + 1
if (k == l) next
rsum <- rowSums(mat[, i:j])
mat <- mat[rsum <= cvec[k], ]
}
}
mat <- mat[rowSums(mat) %in% x, ]
idx <- split(seq(nrow(mat)), rowSums(mat))
return(lapply(idx, function(i, x) x[i, ], mat))
}
system.time({
res <- valid.counts3(x = seq(from = 148, to = 155), cvec = c(36, 67, 92, 119))
})
# user system elapsed
# 0.333 0.048 0.381
sapply(res, nrow)
# 148 149 150 151 152 153 154 155
# 8649 5776 3600 2025 1024 441 144 25
==
by%in%
so you can now use a whole vector likex=seq(from=148,to=155)
as input. For the output, you couldreturn(table(rowSums(mat))
if all you care about is the number of solutions. Also, your use ofunique()
is useless and you do not need to initalizerow.indx = NULL
. \$\endgroup\$