3
\$\begingroup\$

Given:

  • An array c that contains the group assignment of every user (c[i]=4 indicates that user i belongs to group 4)

  • A matrix y of 0 and 1 that indicates whether user i interacted with user j (y[i][j] = 1 if i interacted with j, 0 otherwise)

What is the fastest way to count the number of interactions and lack of interactions between every pair of groups?

This is what I have:

Solution 1:

This is what I have:

import numpy as np
def test1(y, c):
 N = len(c) # number of users
 C = len(set(c)) # number of clusters 
 # Final counts matrices
 I = np.zeros((C,C)) 
 notI = np.zeros((C,C))
 # Loop every row i/ column j, look at the groups where the users belong
 # and update I accordingly
 for i in range(N):
 for j in range(N):
 if y[i][j] == 1:
 I[c[i]][c[j]] += 1
 else: 
 notI[c[i]][c[j]] += 1
 return I, notI 

timeit:

nusers = 100
nclusters = 4
y = np.random.random_integers(0, 1, (nusers,nusers))
c = np.random.choice(nclusters, nusers)
%timeit test1(y,c)

10 loops, best of 3: 42 ms per loop

Solution 2:

def test2(y, c):
 N = len(c)
 C = len(set(c))
 I = np.zeros((C,C))
 notI = np.zeros((C,C))
 for i, jj in enumerate(y):
 for j, value in enumerate(jj):
 if value:
 I[c[i]][c[j]] += 1
 else: 
 notI[c[i]][c[j]] += 1
 return I, notI

timeit: 10 loops, best of 3: 30.8 ms per loop

asked Oct 19, 2014 at 15:58
\$\endgroup\$
3
  • \$\begingroup\$ in Solution 2 you wrote len(z), that is, "z" instead of "c". Typo? z is not defined in the posted code. \$\endgroup\$ Commented Oct 19, 2014 at 17:09
  • \$\begingroup\$ also, will it make a difference if you change if y[i][j] == 1: to if y[i][j]: ? \$\endgroup\$ Commented Oct 19, 2014 at 17:11
  • \$\begingroup\$ @janos it is a typo, thanks! And no, it would make no difference (actually I tested after posting the question and y[i][j]: seems a bit faster) \$\endgroup\$ Commented Oct 19, 2014 at 18:02

1 Answer 1

3
\$\begingroup\$

Re-formulating the problem into a matrix multiplication one speeds up the algorithm very much.

Algebra:

Create a matrix G of n_groups x n_users such that every column is a group mask, that is, all users (columns) that belong to that group (row) are flagged are 1 and 0 otherwise. If Y is the matrix of interactions, then the interactions between every pair of groups can be obtained by the following equation:

$$I = GYG^T$$

Code:

The code is:

import numpy as np
from numpy import dot
def test3(y, c):
 N = len(c)
 C = len(set(c))
 noty = np.mod(y+1,2)
 # create C group masks of 1 (if user is belongs to group) and 0. 
 G = np.zeros((C,N))
 for group in range(C):
 G[group] = [1 if c_== group else 0 for c_ in c]
 # Groups (src) * interaction_matrix * Groups_transposed (dst)
 # Every row of Groups matrix is the mask of a group
 return dot(dot(G,y),G.T), dot(dot(G,noty),G.T)

1000 loops, best of 3: 810 μs per loop

answered Oct 19, 2014 at 19:07
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.