Creating the design matrices for a canonical correlation analysis

Question 1

I have this code to generate symmetric matrices for testing how the design of the canonical correlation analysis I am performing works out. This is a extension of this solution

Each row of the matrix represent a dataset (it is symmetric), and if the value is 0 it means no interaction between the datasets, if it is higher that there is an interaction. The end goal of this code is just to make a grid search of the design that explains better the data I have. However as I need to come up with different designs adding more datasets or less I would like to know how to improve this into a function that would be more general, specially the nested for loops part (If I test it with 5 datasets I add more for loops and then more unlist at the end).

Initial matrix (I usually work with 4 datasets):

C <- matrix(0,ncol = 4, nrow = 4)

Weights for the interactions of each dataset (4 to avoid to many combinations):

nweight <- 4
weight <- seq(from = 0, to = 1, length.out = nweight)

Initiate the list that will contain the matrices

C_list <- vector("list", nweight)
cweight <- as.character(weight)
names(C_list) <- cweight

Loop for each position I want to change to obtain all the combinations of weights I want to test.

for(i1 in cweight) {
 C_list[[as.character(i1)]] <- vector("list", nweight)
 names(C_list[[(i1)]]) <- cweight
 for (i2 in cweight) {
 C_list[[(i1)]][[(i2)]] <- vector("list", nweight)
 names(C_list[[(i1)]][[(i2)]]) <- cweight
 for (i3 in cweight) {
 C_list[[(i1)]][[(i2)]][[(i3)]] <- vector("list", nweight)
 names(C_list[[(i1)]][[(i2)]][[(i3)]]) <- cweight
 for(i4 in cweight) {
 C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]] <- vector("list", nweight)
 names(C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]]) <- cweight
 for (i5 in cweight) {
 C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]][[(i5)]] <- vector("list", nweight)
 names(C_list[[(i1)]][[(i2)]][[(i3)]][[(i4)]][[(i5)]]) <- cweight
 for (i6 in cweight) {
 C[1, 2] <- as.numeric(i1)
 C[2, 1] <- as.numeric(i2)
 C[1, 3] <- as.numeric(i2)
 C[3, 1] <- as.numeric(i2)
 C[1, 4] <- as.numeric(i3)
 C[4, 1] <- as.numeric(i3)
 C[2, 3] <- as.numeric(i4)
 C[3, 2] <- as.numeric(i4)
 C[2, 4] <- as.numeric(i5)
 C[4, 2] <- as.numeric(i5)
 C[4, 3] <- as.numeric(i6)
 C[3, 4] <- as.numeric(i6)
 C_list[[i1]][[i2]][[i3]][[i4]][[i5]][[i6]] <- C
 }
 }
 }
 }
 }
}

Unlist the list of list of list of ... nested matrices to end up with a long list of matrices with the weights for each dataset

C_list2 <- unlist(unlist(unlist(unlist(unlist(C_list, FALSE, FALSE), 
 FALSE, FALSE), FALSE, FALSE), 
 FALSE, FALSE), FALSE, FALSE)

Question 2

Here you want to move away from for loops for two reasons:

your number of for loops depends on your number of datasets, so using for loops prevents you from generalizing your code to any number of datasets.
many for loops will likely slow down your code execution when working with a larger number of datasets.

I think the key to vectorizing your for loops is to use the expand.grid function. If you have

n <- 4

datasets, then you have

p <- n * (n - 1) / 2 # 6

degrees of freedom (the number of for loops in your code, or the number of items on the lower triangle of each matrix). If for each of these you can pick among

w <- seq(from = 0, to = 1, length.out = n)

then you can build the matrix of all possible combinations by doing:

W <- as.matrix(expand.grid(rep(list(w), p)))

Here W is a big matrix with 4096 rows, each row representing a different combination of your (i1, i2, i3, i4, i5, i6) variables:

> head(W)
 Var1 Var2 Var3 Var4 Var5 Var6
[1,] 0.0000000 0.0000000 0 0 0 0
[2,] 0.3333333 0.0000000 0 0 0 0
[3,] 0.6666667 0.0000000 0 0 0 0
[4,] 1.0000000 0.0000000 0 0 0 0
[5,] 0.0000000 0.3333333 0 0 0 0
[6,] 0.3333333 0.3333333 0 0 0 0

These 6 columns are only part of the n * n = 16 values needed in each matrix. We can expand using the following:

X <- matrix(1:(n*n), n, n) # pattern matrix of indices
A <- matrix(0, nrow(W), n * n)
A[, X[lower.tri(X)]] <- W
A[, t(X)[lower.tri(X)]] <- W

A is similar to W in that it is a matrix with 4096 rows, but each row now has the n * n = 16 values of a symmetric matrix.

From there, you can reshape A into a 3D array:

dim(A) <- c(nrow(W), n, n)

and your 4096 matrices can be accessed as follows:

A[1, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0
A[10, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0.0000000 0.3333333 0.6666667 0
# [2,] 0.3333333 0.0000000 0.0000000 0
# [3,] 0.6666667 0.0000000 0.0000000 0
# [4,] 0.0000000 0.0000000 0.0000000 0
A[4096, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 1 1 1
# [2,] 1 0 1 1
# [3,] 1 1 0 1
# [4,] 1 1 1 0

If I were you I would probably stop here, i.e., keep the data in this form. A 3d array might allow you to continue writing vectorized code if the rest of your analysis allows for it. However if you absolutely want a list of matrices, you can do:

C_list2 <- lapply(seq(nrow(A)), function(i) A[i, , ])

(note that the order of the matrices in my data and yours do not match. Let me know if this is a concern, it's probably a matter of reorganizing the rows and/or columns of the W matrix.)

flodel flodelflodel 3,5551 gold badge16 silver badges15 bronze badges · Accepted Answer · 2018-09-11 01:32:56Z

Here you want to move away from for loops for two reasons:

your number of for loops depends on your number of datasets, so using for loops prevents you from generalizing your code to any number of datasets.
many for loops will likely slow down your code execution when working with a larger number of datasets.

I think the key to vectorizing your for loops is to use the expand.grid function. If you have

n <- 4

datasets, then you have

p <- n * (n - 1) / 2 # 6

degrees of freedom (the number of for loops in your code, or the number of items on the lower triangle of each matrix). If for each of these you can pick among

w <- seq(from = 0, to = 1, length.out = n)

then you can build the matrix of all possible combinations by doing:

W <- as.matrix(expand.grid(rep(list(w), p)))

Here W is a big matrix with 4096 rows, each row representing a different combination of your (i1, i2, i3, i4, i5, i6) variables:

> head(W)
 Var1 Var2 Var3 Var4 Var5 Var6
[1,] 0.0000000 0.0000000 0 0 0 0
[2,] 0.3333333 0.0000000 0 0 0 0
[3,] 0.6666667 0.0000000 0 0 0 0
[4,] 1.0000000 0.0000000 0 0 0 0
[5,] 0.0000000 0.3333333 0 0 0 0
[6,] 0.3333333 0.3333333 0 0 0 0

These 6 columns are only part of the n * n = 16 values needed in each matrix. We can expand using the following:

X <- matrix(1:(n*n), n, n) # pattern matrix of indices
A <- matrix(0, nrow(W), n * n)
A[, X[lower.tri(X)]] <- W
A[, t(X)[lower.tri(X)]] <- W

A is similar to W in that it is a matrix with 4096 rows, but each row now has the n * n = 16 values of a symmetric matrix.

From there, you can reshape A into a 3D array:

dim(A) <- c(nrow(W), n, n)

and your 4096 matrices can be accessed as follows:

A[1, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 0 0 0
# [2,] 0 0 0 0
# [3,] 0 0 0 0
# [4,] 0 0 0 0
A[10, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0.0000000 0.3333333 0.6666667 0
# [2,] 0.3333333 0.0000000 0.0000000 0
# [3,] 0.6666667 0.0000000 0.0000000 0
# [4,] 0.0000000 0.0000000 0.0000000 0
A[4096, , ]
# [,1] [,2] [,3] [,4]
# [1,] 0 1 1 1
# [2,] 1 0 1 1
# [3,] 1 1 0 1
# [4,] 1 1 1 0

If I were you I would probably stop here, i.e., keep the data in this form. A 3d array might allow you to continue writing vectorized code if the rest of your analysis allows for it. However if you absolutely want a list of matrices, you can do:

C_list2 <- lapply(seq(nrow(A)), function(i) A[i, , ])

(note that the order of the matrices in my data and yours do not match. Let me know if this is a concern, it's probably a matter of reorganizing the rows and/or columns of the W matrix.)

Stack Exchange Network

Creating the design matrices for a canonical correlation analysis

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Creating the design matrices for a canonical correlation analysis

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions