5
\$\begingroup\$

I need to calculate a Bayesian likelihood in its negative logarithmic form:

$$-\ln L = N\ln M -\sum\limits_{i=1}^{N} \ln\left\{P_i \sum\limits_{j=1}^M \frac{\exp\left[ -\frac{1}{2}\left(\sum_{k=1}^D \frac{(f_{ik}-g_{jk})^2}{\sigma_{ik}^2 + \rho_{jk}^2}\right) \right]}{\prod_{k=1}^D \sqrt{\sigma_{ik}^2 + \rho_{jk}^2}} \right\}$$

where these $$P_i\;\;;\;\; f_{ik} \;\;;\;\; \sigma_{ik}^2\;\;;\;\; g_{jk} \;\;;\;\; \rho_{jk}^2$$ are all known floats. This is my code:

import numpy as np
# Define random data with correct shapes.
N, M = np.random.randint(10, 1000), np.random.randint(10, 1000)
D = np.random.randint(2, 5)
f_N, g_M = np.random.rand(N, D, 2), np.random.rand(M, D, 2)
P_i = np.random.uniform(0., 1., N)
# N summatory. 
# Store here each 'i' element from the M summatory.
sum_Mi = []
for f_i in f_N:
 # M summatory.
 sum_M_j = 0.
 for g_j in g_M:
 # Obtain ijk element inside the M summatory
 D_sum, sigma_ijk = 0., 1.
 for k, f_ik in enumerate(f_i):
 # Where: f_ik[0] = \f_{ik} ; g_j[k][0] = \g_{jk}
 # and: f_ik[1] = \sigma_{ik}^2 ; g_j[k][1] = \rho_{jk}^2
 D_sum += np.square(f_ik[0] - g_j[k][0]) / (f_ik[1] + g_j[k][1])
 sigma_ijk = sigma_ijk * (f_ik[1] + g_j[k][1])
 sum_M_j += np.exp(-0.5 * D_sum) / np.sqrt(sigma_ijk)
 sum_Mi.append(sum_M_j)
sum_Mi_Pi = P_i * sum_Mi
# Remove 0. elements before applying the logarithm below.
sum_N = sum_Mi_Pi[sum_Mi_Pi != 0.]
# Final negative logarithmic likelihood
lik = len(f_N) * np.log(len(g_M)) - sum(np.log(sum_N))
print(lik)

I'm after an optimization done at the code level, ie: not using multi-threading.

I've tried numpy broadcasting on the inner loop

# Obtain ijk element inside the M summatory
D_sum = sum(np.square((f_i[:, 0] - g_j[:, 0])) / (f_i[:, 1] + g_j[:, 1]))
sigma_ijk = np.prod(f_i[:, 1] + g_j[:, 1])

but that actually takes a longer time to complete than using the for loop!

I also tried using the logsumexp function from scipy combined with the broadcasted arrays:

# Store here each 'i' element from the M summatory.
sum_N = 0.
# N summatory.
for f_i in f_N:
 # M summatory.
 A, B = [], []
 for g_j in g_M:
 # Obtain ijk element inside the M summatory
 A.append(sum(
 np.square((f_i[:, 0] - g_j[:, 0])) / (f_i[:, 1] + g_j[:, 1])))
 B.append(np.prod(f_i[:, 1] + g_j[:, 1]))
 sum_N += logsumexp(-.5 * np.array(A), b=1. / np.sqrt(B))
# Final negative logarithmic likelihood
lik = len(f_N) * np.log(len(g_M)) - sum(np.log(P_i[P_i != 0.])) - sum_N
print(lik)

This resulted in a marginal improvement over just using broadcasting, but still slower than the original code.

asked Mar 17, 2017 at 15:27
\$\endgroup\$
9
  • \$\begingroup\$ There also seems to be a way to calculate the \$\sigma\$ from the \$f\,ドル which you do not show. \$\endgroup\$ Commented Mar 17, 2017 at 18:19
  • \$\begingroup\$ $f_{ik}$ is not equal to $f_{jk},ドル the notation implies it but this is not true, Ill fix that. The $\sigma$s are completely independent of the $f$s. (Why is my LateX not rendering?) \$\endgroup\$ Commented Mar 17, 2017 at 18:34
  • 1
    \$\begingroup\$ You need to add a \ in front of your $. \$\endgroup\$ Commented Mar 17, 2017 at 18:42
  • \$\begingroup\$ Thank you @Graipher, I've fixed the notation above. Please let mw know if something doesn't make sense to you. (testing \$\alpha\$) \$\endgroup\$ Commented Mar 17, 2017 at 18:43
  • 2
    \$\begingroup\$ Vectorizing the inner loop doesn't help because D is so small compared to N and M, e.g. 3 v 500 and 500. It's just a few loops of a costly expression. \$\endgroup\$ Commented Mar 17, 2017 at 21:27

1 Answer 1

1
\$\begingroup\$

This is much faster. I added the seed to better compare results across runs.

import numpy as np
# Define random data with correct shapes.
np.random.seed(0)
N, M = np.random.randint(10, 1000), np.random.randint(10, 1000)
D = np.random.randint(2, 5)
f_N, g_M = np.random.rand(N, D, 2), np.random.rand(M, D, 2)
P_i = np.random.uniform(0., 1., N)
# replace loops with broadcasted array operation, followed by sums 
fgsum = f_N[:,None,:,1] + g_M[None,:,:,1]
fgdif = f_N[:,None,:,0] - g_M[None,:,:,0]
print(fgsum.shape) # e.g. (694, 569, 3)
Dsum = (fgdif**2 / fgsum).sum(axis=-1)
sigma_ijk = np.prod(fgsum, axis=-1)
sum_M_j = np.exp(-0.5 * Dsum) / np.sqrt(sigma_ijk)
sum_Mi = np.sum(sum_M_j, axis=-1)
# as before
sum_Mi_Pi = P_i * sum_Mi
# Remove 0. elements before applying the logarithm below.
sum_N = sum_Mi_Pi[sum_Mi_Pi != 0.]
# Final negative logarithmic likelihood
lik = len(f_N) * np.log(len(g_M)) - sum(np.log(sum_N))
print(lik)
answered Mar 17, 2017 at 21:45
\$\endgroup\$
6
  • \$\begingroup\$ Excellent! This code is between 80-100 times faster in my machine, thank you very much hpaulj! \$\endgroup\$ Commented Mar 17, 2017 at 21:54
  • 1
    \$\begingroup\$ You can get another few percent faster by replacing sum(np.log(sum_N)) with np.log(sum_N).sum() or np.sum(np.log(sum_N)). \$\endgroup\$ Commented Mar 18, 2017 at 9:59
  • 1
    \$\begingroup\$ okay ... nice that this is faster, but ... WHY??? \$\endgroup\$ Commented Mar 18, 2017 at 16:41
  • \$\begingroup\$ Because it replaces Python loops with compiled numpy code. Sorry. I normally answer on SO where this kind of numpy 'vectorization' is an everyday question. \$\endgroup\$ Commented Mar 18, 2017 at 16:48
  • 1
    \$\begingroup\$ More about increasing performance with numpy: codereview.stackexchange.com/questions/17702/… \$\endgroup\$ Commented Mar 18, 2017 at 16:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.