3
\$\begingroup\$

Task is reshape 3d array from [row, col, slice] to [slice,row,col]. I tried implement base::aperm analog with Rcpp.

// [[Rcpp::export]]
Rcpp::NumericVector array_perm(const Rcpp::NumericVector& x) {
 if (Rf_isNull(x.attr("dim"))) {
 throw std::runtime_error("'x' does not have 'dim' attibute.");
 }
 Rcpp::Dimension d = x.attr("dim");
 if (d.size() != 3) {
 throw std::runtime_error("'x' must have 3 dimensions.");
 }
 std::size_t n = d[2];
 std::size_t n_vec = d[0] * d[1];
 std::size_t n_out = n_vec * n;
 Rcpp::NumericVector res = Rcpp::no_init(n_out);
 std::size_t ind_from = 0;
 for (std::size_t i = 0; i < n; ++i) {
 std::size_t ind_to = i;
 for (std::size_t j = 0; j < n_vec; ++j) {
 res[ind_to] = x[ind_from];
 ind_to += n;
 ind_from += 1;
 }
 }
 res.attr("dim") = Rcpp::Dimension(d[2], d[0], d[1]);
 return res;
}

How can I improve it?

For testing code:

# Observations
n <- 1000
# Dimension
d <- 100
# Array
a <- replicate(n, matrix(seq_len(d * d), nrow = d, ncol = d))
# Desired result
res <- aperm(a, c(3, 1, 2))
res

Small benchmark current variant of the code:

b <- bench::mark(aperm(a, c(3, 1, 2)), array_perm(a))
b[, c("expression", "min", "median", "max", "itr/sec")]
#> expression min median max `itr/sec`
#> <chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
#> 1 aperm(a, c(3, 1, 2)) 84.9ms 85.1ms 85.5ms 11.7 
#> 2 array_perm(a) 124.8ms 125.2ms 127.2ms 7.96

RcppArmadillo solution

// [[Rcpp::export]]
arma::Cube<double> array_perm2(const arma::Cube<double>& x) {
 std::size_t rows = x.n_rows;
 std::size_t cols = x.n_cols;
 std::size_t slices = x.n_slices;
 arma::Cube<double> res(slices, rows, cols);
 for(std::size_t i = 0; i < rows; ++i) {
 for(std::size_t j = 0; j < cols; ++j) {
 for(std::size_t k = 0; k < slices; ++k) {
 res(k, i, j) = x(i, j, k);
 }
 }
 }
 return res;
}

Benchmark:

 expression min median max `itr/sec`
 <chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
1 aperm(a, c(3, 1, 2)) 85.8ms 86.4ms 87.7ms 11.6 
2 array_perm(a) 116.1ms 127.3ms 129.6ms 8.08
3 array_perm2(a) 217.4ms 219.7ms 222.1ms 4.55
asked Nov 3, 2018 at 14:11
\$\endgroup\$
2
  • \$\begingroup\$ Which version of c++ do you use or have access? \$\endgroup\$ Commented Nov 3, 2018 at 15:02
  • \$\begingroup\$ @Calak Any possible. \$\endgroup\$ Commented Nov 3, 2018 at 15:03

1 Answer 1

4
\$\begingroup\$

If you can change the signature, to directly get cols, rows and slices count, you will not have to check for their validity.

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, 
 std::size_t rows, std::size_t cols, std::size_t slices);

Otherwise, if you can change the error management, I think you'll get a speed boost. Exceptions handling come with a cost. Maybe return an empty vector? I don't know R , so I don't know possibilities.

You can also try to flattening your loops, here you have multiples options:

With the computations into output indexing

Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, std::size_t rows, std::size_t cols, std::size_t slices) {
 // Think about the error management here...
 auto output = Rcpp::NumericVector(Rcpp::no_init(10));
 auto rc = rows * cols;
 auto size = rc * slices;
 for (std::size_t i = 0; i < size; ++i) {
 output[i / rc + i % rc * slices] = input[i];
 }
 return output;
}

With the computations into intput looking

//...
 for (std::size_t i = 0; i < size; ++i) output[i] = input[i/slices + i % slices * rc];
//...

Or in reverse order

//...
 while (size--) output[size] = input[size/slices + size % slices * rc];
//...

Or a mix

//...
 while (size--) output[size / rc + size % rc * slices] = input[size];
//...

Or even a range-based for loop

//...
 std::size_t i = 0;
 for (auto e : input) {
 output[i / rc + i % rc * slices] = e;
 ++i;
 }
//...

PS: Did you tried with another contiguous storage type? (std::vector, plain old C array, ...)

PPS: I don't have R environment, so I only tested transposition algorithms with c++

Edit:

Two other way:

//...
 for(std::size_t i = 0, j = 0; i < size; ++i, j+=rc) {
 if (size <= j) j -= size - 1;
 output[i] = input[j];
 }
//...
// or
//...
 for(std::size_t i = size, j = i-1; i--; j -= rc) {
 if (size < j) j += size-1;
 output[i] = input[j];
 }
//...
answered Nov 3, 2018 at 18:46
\$\endgroup\$
7
  • \$\begingroup\$ Thank you for the reply. res[i] = vec[i/slices + i % slices * rc]; much faster than output[i / rc + i % rc * slices] = input[i];. \$\endgroup\$ Commented Nov 4, 2018 at 5:31
  • \$\begingroup\$ Surely because of the constness of input. And the "reverse order" version? Did you tried also with differents values (overall size and rows, cols or slices count)? Differents values can lead to differents results. And, the range-based version was just for completeness, I suspected it slowest. \$\endgroup\$ Commented Nov 4, 2018 at 5:47
  • \$\begingroup\$ Reverse order the same (a little bit faster). \$\endgroup\$ Commented Nov 4, 2018 at 6:29
  • 2
    \$\begingroup\$ R base::aperm C implementation: github.com/wch/r-source/blob/… \$\endgroup\$ Commented Nov 4, 2018 at 6:31
  • 1
    \$\begingroup\$ Here are, in my edit, two new tries, with less calculation (but not certain of the performance gain) \$\endgroup\$ Commented Nov 4, 2018 at 8:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.