Acceleration of Hidden Markov Modell likelihood calculation in Rcpp

Question 1

I wrote the following code in Rcpp, i.e. it is C++ code that is compiled in R for making faster calculations. The code in Rcpp is intended to calculate in a recursive way the likelihood function for an Hidden Markov Model, the likelihood function in this case is Like_Fun_acoustic_FAST. First I provide the code in Rcpp

 #include <RcppArmadilloExtensions/sample.h>
 #include <omp.h>
 #include <random>
 #include <iostream>
 #include <cmath>
 #include <vector>
 #include <numeric>
 #include <algorithm>
 #include <stdio.h> 
 #include <math.h>
 #include<Rmath.h>
 using namespace Rcpp;
// [[Rcpp::export]]
arma::vec log_sum_exp2_cpp_apply(arma::mat x){
 int n = x.n_rows;
 arma::vec res(n);
 double M;
 for(int j=0; j<n; ++j){
 M = max(x.row(j));
 res[j] = M + log(sum(exp(x.row(j) - M)));
 }
 return res;
}
// [[Rcpp::export]]
arma::mat SummarizeLastCol(arma::cube Q){
 int n = size(Q)[0];
 int k = size(Q)[2];
 int t = size(Q)[1];
 arma::mat cd(n,k);
 for(int j=0; j<k; ++j){
 cd.col(j) = Q.slice(j).tail_cols(1);
 }
 return cd;
}
void inplace_tri_mat_mult(arma::rowvec &x, arma::mat const &trimat){
 arma::uword const n = trimat.n_cols;
 
 for(unsigned j = n; j-- > 0;){
 double tmp(0.);
 for(unsigned i = 0; i <= j; ++i)
 tmp += trimat.at(i, j) * x[i];
 x[j] = tmp;
 }
}
// [[Rcpp::export]]
arma::vec dmvnrm_arma_mc(arma::mat const &x, 
 arma::rowvec const &mean, 
 arma::mat const &sigma, 
 bool const logd = false,
 int const cores = 1) { 
 using arma::uword;
 omp_set_num_threads(cores);
 uword const n = x.n_rows, 
 xdim = x.n_cols;
 arma::vec out(n);
 arma::mat const rooti = arma::inv(trimatu(arma::chol(sigma)));
 double const rootisum = arma::sum(log(rooti.diag())), 
 constants = -(double)xdim/2.0 * log2pi, 
 other_terms = rootisum + constants;
 
 arma::rowvec z;
#pragma omp parallel for schedule(static) private(z)
 for (uword i = 0; i < n; i++) {
 z = (x.row(i) - mean);
 inplace_tri_mat_mult(z, rooti); 
 out(i) = other_terms - 0.5 * arma::dot(z, z); 
 } 
 
 if (logd)
 return out;
 return exp(out);
}
// [[Rcpp::export]]
double Like_Fun_acoustic_FAST( int n, int TT, int k, List Data, arma::mat mu, arma::cube Sigma,
 arma::vec pi, arma::mat PI, int num_feat){
 arma::mat cb(n,2);
 arma::cube q(n,TT,k);
 arma::vec res(n);
 //time point 1
 for(int j=0; j<k; ++j){
 q.slice(j).col(0) = dmvnrm_arma_mc(as<arma::mat>(Data[0]), mu.row(j),
 Sigma.slice(j), true,4) + log(pi[j]); 
 }
 // //late time points
 for(int t=1; t<TT; ++t){
 for(int j=0; j<k; ++j){
 if(k>0){
 q.slice(j).col(t) = q.slice(0).col(t-1) + log(PI(0,j));
 for(int d=1; d<k; ++d){
 cb.col(0) = q.slice(j).col(t);
 cb.col(1) = q.slice(d).col(t-1)+log(PI(d,j));
 (q.slice(j)).col(t) = log_sum_exp2_cpp_apply(cb);
 }
 q.slice(j).col(t) +=
 dmvnrm_arma_mc(as<arma::mat>(Data[t]), mu.row(j),
 Sigma.slice(j), true,4);
 }
 if(k==1){
 q.slice(j).submat(0,t,n-1,t) = q.slice(0).submat(0,t-1,n-1,t-1) + log(PI(0,j)) +
 dmvnrm_arma_mc(as<arma::mat>(Data[t]), mu.submat(j,0,j,num_feat-1),
 Sigma.slice(j), true,4);
 }
 
 }
 }
 if(k>1){
 cb = SummarizeLastCol(q);
 res = log_sum_exp2_cpp_apply(cb);
 return accu(res);
 }
 if(k==1){
 return accu(q.slice(0).submat(0,TT-1,n-1,TT-1));
 }
}

I tried to optimize all the functions shown by avoiding using any cumbersome functions, however, the time needed for the function Like_Fun_acoustic_FAST to run is still too much. The first function log_sum_exp2_cpp_apply takes as input a matrix and calculates the log sum for each row. The function SummarizeLastCol takes a cube and a creates a matrix whose columns are the last column for matrix component of the cube. The dmvnrm_arma_mc calculates the multivariate normal density based on parallelization.

The likelihood function Like_Fun_acoustic_FAST recursively integrates out the latent states of a hidden markov chain. It does that with the use of the cube q which stores K latent components corresponding to K latent states, and for each latent state we have a matrix of size n row and TT columns corresponding to observations and time respectively. In particular, we have a time series of TT points, and on each point we calculate the multivariate normal density conditional on all the possible latent states.

For the number of cores used for the parallelization I used the optimal number, based on which number of cores gives me the fastest calculations.

So, I try to find any possible way to make the algorithms faster, because in my eyes all functions are written in the most neat way in terms of time complexity.

I give inputs if someone wants to reproduce the example in R.

library(MASS)
library(extraDistr)
n_dim = 10
TT = 15
n_feat = 3
K = 3
m = matrix(NA,K,n_feat)
for(i in 1:K){
 m[i,] = rnorm(n_feat,0,1)
}
Sigma = array(NA,c(n_feat,n_feat,K))
for(i in 1:K){
 Sigma[,,i] = rwishart(3,diag(1,n_feat))
}
data = list()
for(i in 1:TT){
 data[[i]] = mvrnorm(n_dim, m[1,], Sigma[,,1]) 
}
prob_pi = rdirichlet(1,rep(1,K))
prob_P = matrix(NA,K,K)
for(i in 1:K){
 prob_P[i,] = rdirichlet(1,rep(1,K))
}
Like_Fun_acoustic_FAST( n_dim, TT, K, data, m, Sigma,
 prob_pi, 
 prob_P, n_feat)

Question 2

"the time needed for the function Like_Fun_acoustic_FAST to run is still too much." How much is too much? How much should it be in your opinion? On what kind of device is your code running, and how often is it called with how much data each time? Did you enable all relevant compiler optimizations?

Question 3

@G.Sliepen It need 0.8seconds. Through my Reversible Jump mcmc I call it for the dimensionality step twice, and for the Metropolis Hastings for each parameter twice again. So, in general I call it 2 + 2*4, so in general for each iteration it takes (10*0.8) = 8 seconds. However, because it is a real problem analysis, there is no optimal time complexity that I want to achieve, I just want to make it as fast as possible. It is running on a server of my institution.

Question 4

The characteristic from the R session info are: R version 4.3.0 (2023年04月21日) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS

Question 5

@G.Sliepen The data, are a number of K matrices where is matrix is ````n``` times TT, and those matrices are used 10 times on each iterations of my MCMC, i.e. they are used each time that the Like_Fun_acoustic_FAST is used.

Question 6

Show us how you compile this, please, including optimization switches passed to the compiler. For example, are you on SPARC? On Nehalem? On some other particular chipset where -march processor-specific optimizations would be relevant? For extra credit, mention a link to godbolt.org that shows how your source code is translated to assembly.

Question 7

I tried to look for performance issues, but unfortunately I can't, because your code has a major problem:

Naming things

Your code is very hard to read because of the inconsistent way you are naming things. Also, while you are apparently not afraid to use long names for certain things, in most cases you are unnecessarily abbreviating things.

Let's begin with Like_Fun_acoustic_FAST(). I also like fun! But I wouldn't have known it was a "likelihood function" if it wasn't for your description in the question. It's not mentioned anywhere in the code. But "likelihood" is just a value that you want to calculate, there is no "likelihood function" in the mathematical problem. Sure, you wrote a C++ function to calculate it, but why put Fun in this function's name? You didn't do that for the other functions. Why are Like and Fun capitalized, but not acoustic? Why is FAST in all caps? That makes it either look like it's an acronym, or that you are shouting.

Why did you add _FAST to the function name? Sure you want this to go fast, but typically you want all your functions to run fast. It might make sense to add this to the function if there is also a _SLOW, or perhaps an _ACCURATE if you have a trade-off between speed and accuracy. Otherwise, just leave it off.

What does acoustic mean here? Is there something like an "acoustic likelihood"? That doesn't sound very likely to me. Maybe the problem involves sound waves, but what exactly is it you want to calculate the likelihood of? If you are trying to detect the presence of some feature in a sound signal, and want to know the likelihood of it actually being there, then I would name this function calculate_likelihood_of_feature(), where feature is replaced by the actual name of the feature.

Go over all the function names, make sure they use the same style of naming (I recommend using snake_case, as that is the most commonly used style for function and variable names in C++ code), and ensure the name clearly explains what it is doing.

Do the same for variable names: make sure they are consistent, and convey clearly what information they hold.

Some abbreviations are fine, but only if they are widely used ones, and are unambiguous in the context where they are used. For example, i, j and k are often used as loop indices, so that might be fine. However, compare:

for(int j=0; j<n; ++j){

With:

for (int row = 0; row < num_rows; ++row) {

Sure, it's a bit more typing, but now I instantly know what is being looped over, whereas with j and n I have to look elsewhere to see what is actually stored in those variables.

Another thing you can do is imagine your are talking with someone about your code, and having to pronounce your variable and function names. Will they understand you if you say "dmvnrm_arma_mc"? What about "pi" and "PI"? Won't they think you are talking about the constant \$\pi\$?

Performance

I don't even know what exactly this code is doing, so I can't tell whether your algorithm can be improved or not. There are a few things that I did notice that you might want to look at:

Do you really need double precision? Often you get a speedup of more than 2 by using float instead.
Avoid unnecessary allocations. dmvnrm_arma_mc() allocates a vector and a matrix each time it is called. Those will have exactly the same size. It might be better to ensure these are allocated once and then reused.

Question 8

Do you have any references for using floats being more than twice as fast as using double? I've seen many instances where the opposite is true, and from what I can tell, it seems to be very hardware dependent. Here is the result of a lazy search: stackoverflow.com/a/4584707/4408538

Question 9

There's throughput and latency. A single float operation might take just as long as a double, but since you can pack twice the amount of floats in a SSE register, usually you get double the throughput. There are also situations where you are more limited by memory bandwidth than by CPU power, in which case again floats could have twice the performance of a double by virtue of being half the size. Something that could slow down floats is if you have to convert between float and double, so make sure all your operations use and return the same type.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges180 bronze badges · Accepted Answer · 2024-01-11 21:53:03Z

I tried to look for performance issues, but unfortunately I can't, because your code has a major problem:

Naming things

Your code is very hard to read because of the inconsistent way you are naming things. Also, while you are apparently not afraid to use long names for certain things, in most cases you are unnecessarily abbreviating things.

Let's begin with Like_Fun_acoustic_FAST(). I also like fun! But I wouldn't have known it was a "likelihood function" if it wasn't for your description in the question. It's not mentioned anywhere in the code. But "likelihood" is just a value that you want to calculate, there is no "likelihood function" in the mathematical problem. Sure, you wrote a C++ function to calculate it, but why put Fun in this function's name? You didn't do that for the other functions. Why are Like and Fun capitalized, but not acoustic? Why is FAST in all caps? That makes it either look like it's an acronym, or that you are shouting.

Why did you add _FAST to the function name? Sure you want this to go fast, but typically you want all your functions to run fast. It might make sense to add this to the function if there is also a _SLOW, or perhaps an _ACCURATE if you have a trade-off between speed and accuracy. Otherwise, just leave it off.

What does acoustic mean here? Is there something like an "acoustic likelihood"? That doesn't sound very likely to me. Maybe the problem involves sound waves, but what exactly is it you want to calculate the likelihood of? If you are trying to detect the presence of some feature in a sound signal, and want to know the likelihood of it actually being there, then I would name this function calculate_likelihood_of_feature(), where feature is replaced by the actual name of the feature.

Go over all the function names, make sure they use the same style of naming (I recommend using snake_case, as that is the most commonly used style for function and variable names in C++ code), and ensure the name clearly explains what it is doing.

Do the same for variable names: make sure they are consistent, and convey clearly what information they hold.

Some abbreviations are fine, but only if they are widely used ones, and are unambiguous in the context where they are used. For example, i, j and k are often used as loop indices, so that might be fine. However, compare:

for(int j=0; j<n; ++j){

With:

for (int row = 0; row < num_rows; ++row) {

Sure, it's a bit more typing, but now I instantly know what is being looped over, whereas with j and n I have to look elsewhere to see what is actually stored in those variables.

Another thing you can do is imagine your are talking with someone about your code, and having to pronounce your variable and function names. Will they understand you if you say "dmvnrm_arma_mc"? What about "pi" and "PI"? Won't they think you are talking about the constant \$\pi\$?

Performance

I don't even know what exactly this code is doing, so I can't tell whether your algorithm can be improved or not. There are a few things that I did notice that you might want to look at:

Do you really need double precision? Often you get a speedup of more than 2 by using float instead.
Avoid unnecessary allocations. dmvnrm_arma_mc() allocates a vector and a matrix each time it is called. Those will have exactly the same size. It might be better to ensure these are allocated once and then reused.

Do you have any references for using floats being more than twice as fast as using double? I've seen many instances where the opposite is true, and from what I can tell, it seems to be very hardware dependent. Here is the result of a lazy search: stackoverflow.com/a/4584707/4408538
There's throughput and latency. A single float operation might take just as long as a double, but since you can pack twice the amount of floats in a SSE register, usually you get double the throughput. There are also situations where you are more limited by memory bandwidth than by CPU power, in which case again floats could have twice the performance of a double by virtue of being half the size. Something that could slow down floats is if you have to convert between float and double, so make sure all your operations use and return the same type.

Stack Exchange Network

Acceleration of Hidden Markov Modell likelihood calculation in Rcpp

1 Answer 1

Naming things

Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Acceleration of Hidden Markov Modell likelihood calculation in Rcpp

1 Answer 1

Naming things

Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions