Task is reshape 3d array from [row, col, slice] to [slice,row,col]. I tried implement base::aperm
analog with Rcpp
.
// [[Rcpp::export]]
Rcpp::NumericVector array_perm(const Rcpp::NumericVector& x) {
if (Rf_isNull(x.attr("dim"))) {
throw std::runtime_error("'x' does not have 'dim' attibute.");
}
Rcpp::Dimension d = x.attr("dim");
if (d.size() != 3) {
throw std::runtime_error("'x' must have 3 dimensions.");
}
std::size_t n = d[2];
std::size_t n_vec = d[0] * d[1];
std::size_t n_out = n_vec * n;
Rcpp::NumericVector res = Rcpp::no_init(n_out);
std::size_t ind_from = 0;
for (std::size_t i = 0; i < n; ++i) {
std::size_t ind_to = i;
for (std::size_t j = 0; j < n_vec; ++j) {
res[ind_to] = x[ind_from];
ind_to += n;
ind_from += 1;
}
}
res.attr("dim") = Rcpp::Dimension(d[2], d[0], d[1]);
return res;
}
How can I improve it?
For testing code:
# Observations
n <- 1000
# Dimension
d <- 100
# Array
a <- replicate(n, matrix(seq_len(d * d), nrow = d, ncol = d))
# Desired result
res <- aperm(a, c(3, 1, 2))
res
Small benchmark current variant of the code:
b <- bench::mark(aperm(a, c(3, 1, 2)), array_perm(a))
b[, c("expression", "min", "median", "max", "itr/sec")]
#> expression min median max `itr/sec`
#> <chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
#> 1 aperm(a, c(3, 1, 2)) 84.9ms 85.1ms 85.5ms 11.7
#> 2 array_perm(a) 124.8ms 125.2ms 127.2ms 7.96
RcppArmadillo solution
// [[Rcpp::export]]
arma::Cube<double> array_perm2(const arma::Cube<double>& x) {
std::size_t rows = x.n_rows;
std::size_t cols = x.n_cols;
std::size_t slices = x.n_slices;
arma::Cube<double> res(slices, rows, cols);
for(std::size_t i = 0; i < rows; ++i) {
for(std::size_t j = 0; j < cols; ++j) {
for(std::size_t k = 0; k < slices; ++k) {
res(k, i, j) = x(i, j, k);
}
}
}
return res;
}
Benchmark:
expression min median max `itr/sec`
<chr> <bch:tm> <bch:tm> <bch:tm> <dbl>
1 aperm(a, c(3, 1, 2)) 85.8ms 86.4ms 87.7ms 11.6
2 array_perm(a) 116.1ms 127.3ms 129.6ms 8.08
3 array_perm2(a) 217.4ms 219.7ms 222.1ms 4.55
-
\$\begingroup\$ Which version of c++ do you use or have access? \$\endgroup\$Calak– Calak2018年11月03日 15:02:37 +00:00Commented Nov 3, 2018 at 15:02
-
\$\begingroup\$ @Calak Any possible. \$\endgroup\$Artem Klevtsov– Artem Klevtsov2018年11月03日 15:03:49 +00:00Commented Nov 3, 2018 at 15:03
1 Answer 1
If you can change the signature, to directly get cols, rows and slices count, you will not have to check for their validity.
Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input,
std::size_t rows, std::size_t cols, std::size_t slices);
Otherwise, if you can change the error management, I think you'll get a speed boost. Exceptions handling come with a cost. Maybe return an empty vector? I don't know R
, so I don't know possibilities.
You can also try to flattening your loops, here you have multiples options:
With the computations into output indexing
Rcpp::NumericVector array_perm(const Rcpp::NumericVector& input, std::size_t rows, std::size_t cols, std::size_t slices) {
// Think about the error management here...
auto output = Rcpp::NumericVector(Rcpp::no_init(10));
auto rc = rows * cols;
auto size = rc * slices;
for (std::size_t i = 0; i < size; ++i) {
output[i / rc + i % rc * slices] = input[i];
}
return output;
}
With the computations into intput looking
//...
for (std::size_t i = 0; i < size; ++i) output[i] = input[i/slices + i % slices * rc];
//...
Or in reverse order
//...
while (size--) output[size] = input[size/slices + size % slices * rc];
//...
Or a mix
//...
while (size--) output[size / rc + size % rc * slices] = input[size];
//...
Or even a range-based for loop
//...
std::size_t i = 0;
for (auto e : input) {
output[i / rc + i % rc * slices] = e;
++i;
}
//...
PS: Did you tried with another contiguous storage type? (std::vector, plain old C array, ...)
PPS: I don't have R
environment, so I only tested transposition algorithms with c++
Edit:
Two other way:
//...
for(std::size_t i = 0, j = 0; i < size; ++i, j+=rc) {
if (size <= j) j -= size - 1;
output[i] = input[j];
}
//...
// or
//...
for(std::size_t i = size, j = i-1; i--; j -= rc) {
if (size < j) j += size-1;
output[i] = input[j];
}
//...
-
\$\begingroup\$ Thank you for the reply.
res[i] = vec[i/slices + i % slices * rc];
much faster thanoutput[i / rc + i % rc * slices] = input[i];
. \$\endgroup\$Artem Klevtsov– Artem Klevtsov2018年11月04日 05:31:09 +00:00Commented Nov 4, 2018 at 5:31 -
\$\begingroup\$ Surely because of the constness of
input
. And the "reverse order" version? Did you tried also with differents values (overall size and rows, cols or slices count)? Differents values can lead to differents results. And, the range-based version was just for completeness, I suspected it slowest. \$\endgroup\$Calak– Calak2018年11月04日 05:47:56 +00:00Commented Nov 4, 2018 at 5:47 -
\$\begingroup\$ Reverse order the same (a little bit faster). \$\endgroup\$Artem Klevtsov– Artem Klevtsov2018年11月04日 06:29:57 +00:00Commented Nov 4, 2018 at 6:29
-
2\$\begingroup\$ R
base::aperm
C implementation: github.com/wch/r-source/blob/… \$\endgroup\$Artem Klevtsov– Artem Klevtsov2018年11月04日 06:31:32 +00:00Commented Nov 4, 2018 at 6:31 -
1\$\begingroup\$ Here are, in my edit, two new tries, with less calculation (but not certain of the performance gain) \$\endgroup\$Calak– Calak2018年11月04日 08:26:48 +00:00Commented Nov 4, 2018 at 8:26