I have this code to generate symmetric matrices for testing how the design of the canonical correlation analysis I am performing works out. This is a extension of this solution
Each row of the matrix represent a dataset (it is symmetric), and if the value is 0 it means no interaction between the datasets, if it is higher that there is an interaction. The end goal of this code is just to make a grid search of the design that explains better the data I have.
However as I need to come up with different designs adding more datasets or less I would like to know how to improve this into a function that would be more general, specially the nested for loops part (If I test it with 5 datasets I add more for
loops and then more unlist
at the end).
Initial matrix (I usually work with 4 datasets):
C <- matrix(0,ncol = 4, nrow = 4)
Weights for the interactions of each dataset (4 to avoid to many combinations):
nweight <- 4
weight <- seq(from = 0, to = 1, length.out = nweight)
Initiate the list that will contain the matrices
C_list <- vector("list", nweight)
cweight <- as.character(weight)
names(C_list) <- cweight
Loop for each position I want to change to obtain all the combinations of weights I want to test.
for(i1 in cweight) {
C_list[[as.character(i1)]] <- vector("list", nweight)
names(C_list[[(i1)]]) <- cweight
for (i2 in cweight) {
C_list[[(i1)]][[(i2)]] <- vector("list", nweight)
names(C_list[[(i1)]][[(i2)]]) <- cweight
for (i3 in cweight) {
C_list[[(i1)]][[(i2)]][[(i3)]] <- vector("list", nweight)
names(C_list[[(i1)]][[(i2)]][[(i3)]]) <- cweight
for(i4 in cweight) {
C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]] <- vector("list", nweight)
names(C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]]) <- cweight
for (i5 in cweight) {
C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]][[(i5)]] <- vector("list", nweight)
names(C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]][[(i5)]]) <- cweight
for (i6 in cweight) {
C[1, 2] <- as.numeric(i1)
C[2, 1] <- as.numeric(i2)
C[1, 3] <- as.numeric(i2)
C[3, 1] <- as.numeric(i2)
C[1, 4] <- as.numeric(i3)
C[4, 1] <- as.numeric(i3)
C[2, 3] <- as.numeric(i4)
C[3, 2] <- as.numeric(i4)
C[2, 4] <- as.numeric(i5)
C[4, 2] <- as.numeric(i5)
C[4, 3] <- as.numeric(i6)
C[3, 4] <- as.numeric(i6)
C_list[[i1]][[i2]][[i3]][[i4]][[i5]][[i6]] <- C
}
}
}
}
}
}
Unlist the list of list of list of ... nested matrices to end up with a long list of matrices with the weights for each dataset
C_list2 <- unlist(unlist(unlist(unlist(unlist(C_list, FALSE, FALSE),
FALSE, FALSE), FALSE, FALSE),
FALSE, FALSE), FALSE, FALSE)
1 Answer 1
Here you want to move away from for
loops for two reasons:
- your number of
for
loops depends on your number of datasets, so usingfor
loops prevents you from generalizing your code to any number of datasets. - many
for
loops will likely slow down your code execution when working with a larger number of datasets.
I think the key to vectorizing your for
loops is to use the expand.grid
function. If you have
n <- 4
datasets, then you have
p <- n * (n - 1) / 2 # 6
degrees of freedom (the number of for
loops in your code, or the number of items on the lower triangle of each matrix). If for each of these you can pick among
w <- seq(from = 0, to = 1, length.out = n)
then you can build the matrix of all possible combinations by doing:
W <- as.matrix(expand.grid(rep(list(w), p)))
Here W
is a big matrix with 4096 rows, each row representing a different combination of your (i1, i2, i3, i4, i5, i6)
variables:
> head(W)
Var1 Var2 Var3 Var4 Var5 Var6
[1,] 0.0000000 0.0000000 0 0 0 0
[2,] 0.3333333 0.0000000 0 0 0 0
[3,] 0.6666667 0.0000000 0 0 0 0
[4,] 1.0000000 0.0000000 0 0 0 0
[5,] 0.0000000 0.3333333 0 0 0 0
[6,] 0.3333333 0.3333333 0 0 0 0
These 6
columns are only part of the n * n = 16
values needed in each matrix. We can expand using the following:
X <- matrix(1:(n*n), n, n) # pattern matrix of indices
A <- matrix(0, nrow(W), n * n)
A[, X[lower.tri(X)]] <- W
A[, t(X)[lower.tri(X)]] <- W
A
is similar to W
in that it is a matrix with 4096
rows, but each row now has the n * n = 16
values of a symmetric matrix.
From there, you can reshape A
into a 3D array:
dim(A) <- c(nrow(W), n, n)
and your 4096
matrices can be accessed as follows:
A[1, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0
A[10, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0.0000000 0.3333333 0.6666667 0
# [2,] 0.3333333 0.0000000 0.0000000 0
# [3,] 0.6666667 0.0000000 0.0000000 0
# [4,] 0.0000000 0.0000000 0.0000000 0
A[4096, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 1 1 1
# [2,] 1 0 1 1
# [3,] 1 1 0 1
# [4,] 1 1 1 0
If I were you I would probably stop here, i.e., keep the data in this form. A 3d array might allow you to continue writing vectorized code if the rest of your analysis allows for it. However if you absolutely want a list of matrices, you can do:
C_list2 <- lapply(seq(nrow(A)), function(i) A[i, , ])
(note that the order of the matrices in my data and yours do not match. Let me know if this is a concern, it's probably a matter of reorganizing the rows and/or columns of the W
matrix.)
Explore related questions
See similar questions with these tags.