Effective 3d array transposition with R/C++ (Rcpp)

Question 1

Task is reshape 3d array from [row, col, slice] to [slice,row,col]. I tried implement base::aperm analog with Rcpp.

// [[Rcpp::export]]
Rcpp::NumericVector array_perm(const Rcpp::NumericVector& x) {
 if (Rf_isNull(x.attr("dim"))) {
 throw std::runtime_error("'x' does not have 'dim' attibute.");
 }
 Rcpp::Dimension d = x.attr("dim");
 if (d.size() != 3) {
 throw std::runtime_error("'x' must have 3 dimensions.");
 }
 std::size_t n = d[2];
 std::size_t n_vec = d[0] * d[1];
 std::size_t n_out = n_vec * n;
 Rcpp::NumericVector res = Rcpp::no_init(n_out);
 std::size_t ind_from = 0;
 for (std::size_t i = 0; i < n; ++i) {
 std::size_t ind_to = i;
 for (std::size_t j = 0; j < n_vec; ++j) {
 res[ind_to] = x[ind_from];
 ind_to += n;
 ind_from += 1;
 }
 }
 res.attr("dim") = Rcpp::Dimension(d[2], d[0], d[1]);
 return res;
}

How can I improve it?

For testing code:

# Observations
n <- 1000
# Dimension
d <- 100
# Array
a <- replicate(n, matrix(seq_len(d * d), nrow = d, ncol = d))
# Desired result
res <- aperm(a, c(3, 1, 2))
res

Small benchmark current variant of the code:

b <- bench::mark(aperm(a, c(3, 1, 2)), array_perm(a))
b[, c("expression", "min", "median", "max", "itr/sec")]
#> expression min median max `itr/sec`
#> <chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
#> 1 aperm(a, c(3, 1, 2)) 84.9ms 85.1ms 85.5ms 11.7 
#> 2 array_perm(a) 124.8ms 125.2ms 127.2ms 7.96

RcppArmadillo solution

// [[Rcpp::export]]
arma::Cube<double> array_perm2(const arma::Cube<double>& x) {
 std::size_t rows = x.n_rows;
 std::size_t cols = x.n_cols;
 std::size_t slices = x.n_slices;
 arma::Cube<double> res(slices, rows, cols);
 for(std::size_t i = 0; i < rows; ++i) {
 for(std::size_t j = 0; j < cols; ++j) {
 for(std::size_t k = 0; k < slices; ++k) {
 res(k, i, j) = x(i, j, k);
 }
 }
 }
 return res;
}

Benchmark:

 expression min median max `itr/sec`
 <chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
1 aperm(a, c(3, 1, 2)) 85.8ms 86.4ms 87.7ms 11.6 
2 array_perm(a) 116.1ms 127.3ms 129.6ms 8.08
3 array_perm2(a) 217.4ms 219.7ms 222.1ms 4.55

Question 2

Which version of c++ do you use or have access?

Question 3

@Calak Any possible.

Question 4

If you can change the signature, to directly get cols, rows and slices count, you will not have to check for their validity.

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, 
 std::size_t rows, std::size_t cols, std::size_t slices);

Otherwise, if you can change the error management, I think you'll get a speed boost. Exceptions handling come with a cost. Maybe return an empty vector? I don't know R , so I don't know possibilities.

You can also try to flattening your loops, here you have multiples options:

With the computations into output indexing

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, std::size_t rows, std::size_t cols, std::size_t slices) {
 // Think about the error management here...
 auto output = Rcpp::NumericVector(Rcpp::no_init(10));
 auto rc = rows * cols;
 auto size = rc * slices;
 for (std::size_t i = 0; i < size; ++i) {
 output[i / rc + i % rc * slices] = input[i];
 }
 return output;
}

With the computations into intput looking

//...
 for (std::size_t i = 0; i < size; ++i) output[i] = input[i/slices + i % slices * rc];
//...

Or in reverse order

//...
 while (size--) output[size] = input[size/slices + size % slices * rc];
//...

Or a mix

//...
 while (size--) output[size / rc + size % rc * slices] = input[size];
//...

Or even a range-based for loop

//...
 std::size_t i = 0;
 for (auto e : input) {
 output[i / rc + i % rc * slices] = e;
 ++i;
 }
//...

PS: Did you tried with another contiguous storage type? (std::vector, plain old C array, ...)

PPS: I don't have R environment, so I only tested transposition algorithms with c++

Edit:

Two other way:

//...
 for(std::size_t i = 0, j = 0; i < size; ++i, j+=rc) {
 if (size <= j) j -= size - 1;
 output[i] = input[j];
 }
//...
// or
//...
 for(std::size_t i = size, j = i-1; i--; j -= rc) {
 if (size < j) j += size-1;
 output[i] = input[j];
 }
//...

Question 5

Thank you for the reply. res[i] = vec[i/slices + i % slices * rc]; much faster than output[i / rc + i % rc * slices] = input[i];.

Question 6

Surely because of the constness of input. And the "reverse order" version? Did you tried also with differents values (overall size and rows, cols or slices count)? Differents values can lead to differents results. And, the range-based version was just for completeness, I suspected it slowest.

Question 7

Reverse order the same (a little bit faster).

Question 8

R base::aperm C implementation: github.com/wch/r-source/blob/…

Question 9

Here are, in my edit, two new tries, with less calculation (but not certain of the performance gain)

Calak Calak 2,39111 silver badges19 bronze badges · Accepted Answer · 2018-11-03 18:46:47Z

If you can change the signature, to directly get cols, rows and slices count, you will not have to check for their validity.

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, 
 std::size_t rows, std::size_t cols, std::size_t slices);

Otherwise, if you can change the error management, I think you'll get a speed boost. Exceptions handling come with a cost. Maybe return an empty vector? I don't know R , so I don't know possibilities.

You can also try to flattening your loops, here you have multiples options:

With the computations into output indexing

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, std::size_t rows, std::size_t cols, std::size_t slices) {
 // Think about the error management here...
 auto output = Rcpp::NumericVector(Rcpp::no_init(10));
 auto rc = rows * cols;
 auto size = rc * slices;
 for (std::size_t i = 0; i < size; ++i) {
 output[i / rc + i % rc * slices] = input[i];
 }
 return output;
}

With the computations into intput looking

//...
 for (std::size_t i = 0; i < size; ++i) output[i] = input[i/slices + i % slices * rc];
//...

Or in reverse order

//...
 while (size--) output[size] = input[size/slices + size % slices * rc];
//...

Or a mix

//...
 while (size--) output[size / rc + size % rc * slices] = input[size];
//...

Or even a range-based for loop

//...
 std::size_t i = 0;
 for (auto e : input) {
 output[i / rc + i % rc * slices] = e;
 ++i;
 }
//...

PS: Did you tried with another contiguous storage type? (std::vector, plain old C array, ...)

PPS: I don't have R environment, so I only tested transposition algorithms with c++

Edit:

Two other way:

//...
 for(std::size_t i = 0, j = 0; i < size; ++i, j+=rc) {
 if (size <= j) j -= size - 1;
 output[i] = input[j];
 }
//...
// or
//...
 for(std::size_t i = size, j = i-1; i--; j -= rc) {
 if (size < j) j += size-1;
 output[i] = input[j];
 }
//...

Thank you for the reply. res[i] = vec[i/slices + i % slices * rc]; much faster than output[i / rc + i % rc * slices] = input[i];.
Surely because of the constness of input. And the "reverse order" version? Did you tried also with differents values (overall size and rows, cols or slices count)? Differents values can lead to differents results. And, the range-based version was just for completeness, I suspected it slowest.
R base::aperm C implementation: github.com/wch/r-source/blob/…
Here are, in my edit, two new tries, with less calculation (but not certain of the performance gain)

Stack Exchange Network

Effective 3d array transposition with R/C++ (Rcpp)

RcppArmadillo solution

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Effective 3d array transposition with R/C++ (Rcpp)

RcppArmadillo solution

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions