Vectorization of power law fitting

Question 1

I am moving a code base to run using PyTorch tensor on GPU. For loops are terrible on GPU especially for small-sized data. I am trying to vectorize the function below, i.e. to have no for loops.

I need help with Ds[] specifically.

Once I have the vectorized numpy version, converting to tensors should be straightforward.

import numpy
data = np.array( [ [ 11, 2, 3, 40, 5, 6, 7, 8, 9, 10 ] )
for i, xmin in enumerate( data[:-1] ):
 n = float( N - i )
 alpha = 1 + n / ( np.sum( log_data[i:] ) - n * log_data[i] )
 alphas[i] = alpha
 if alpha > 1:
 print( i, 1 - ( data[i:] / xmin ) ** ( -alpha + 1 ) )
 Ds[i] = np.max( np.abs( 
 1 - (data[i:] / xmin) ** (-alpha + 1) # Theoretical CDF\
 - np.arange( n ) / n # Actual CDF
 ))
sigmas = ( alphas - 1 ) / np.sqrt( N - np.arange( N - 1 ) )

I was trying to implement this step by step. But it seems the second term np.arange(n) / n is wrong

import numpy as np
def fit_power_law(data):
 # Assuming 'data' is already a NumPy array
 logdata = np.log(data)
 N = len(data)
 # Compute cumulative sums from the end
 sums = np.cumsum(logdata[::-1])[::-1]
 # Create array of n values
 ns = np.arange(N, 0, -1, dtype=np.float64)
 # Compute alphas
 alphas = 1 + ns / (sums - ns * logdata)
 # Filter alphas to only include those > 1
 valid_indices = alphas > 1
 filtered_alphas = alphas[valid_indices]
 filtered_data = data[valid_indices]
 M = len(filtered_data)
 # Create a matrix of NaNs
 matrix = np.full((M, M), np.nan)
 # Create an index array for broadcasting
 idx = np.arange(M)
 shifted_indices = idx[:, None] + idx
 # Mask to avoid out-of-bound indices
 mask = shifted_indices < M
 # Fill the matrix with the appropriate values
 matrix[mask] = filtered_data[shifted_indices[mask]]
 # Divide each row by the data array
 tt = matrix / filtered_data[:, np.newaxis]
 # Calculate the first term [1 - (data[i:] / xmin) ** (-alpha + 1)]
 first_term = 1 - tt ** (-filtered_alphas[:, np.newaxis] + 1)
 
 # Reverse the filtered data array
 data_reversed = filtered_data
 # Create a range array and a mask
 range_array = np.arange(int(data.shape[0])) 
 mask = range_array < data_reversed[:, np.newaxis]
 # Create the result array
 result = np.where(mask, range_array, np.nan)
 # Calculate the length of each inner array (excluding NaNs)
 lengths = np.sum(~np.isnan(result), axis=1)
 # Ensure lengths are at least 1 to avoid division by zero
 lengths = np.maximum(lengths, 1)
 # Broadcast lengths to match the shape of result for division
 lengths_broadcast = lengths[:, np.newaxis]
 # np.arange(n) / n #This is wrong
 second_term = np.where(~np.isnan(result), result / lengths_broadcast, np.nan)
 # Calculate the difference
 min_answer = first_term - second_term
 # Return the maximum absolute difference
 ds = np.nanmax(np.abs(min_answer), axis=1)[:-1]
 alphas = alphas[:-1]
 sigma = (alphas - 1) / np.sqrt(N - np.arange(N - 1, dtype=np.float64))
 ds_final = np.zeros(N-1, dtype=np.float64)
 ds_final[valid_indices] = ds
 return ds_final, alphas, sigma

Question 2

Can you modify the code so it can be reproducible? Like for example initializing the required variables to random values (with the right distribution which certainly matters to know how many time the conditional if alpha > 1 is taken) using the right dtype and shape.

Question 3

Note that for such a code, I expect tools like Cython or Numba to be the way to go as vectorizing this seems non-trivial and possibly inefficient in the end (for example, if the condition is often not taken). Can you use them and/or did you tried them?

Question 4

@JérômeRichard I want to convert it to Pytorch and run it on GPU. Once I have the vectorized numpy version, converting to tensors should be straightforward.

Question 5

You can't 'vectorize` the max on variable size arrays produced by np.abs( 1 - (data[i:] / xmin) ** (-alpha + 1) - np.arange(n) / n. ufunc like np.max have an accumulate method, but that takes an array argument, not a function. So you can't rewite that as a single call to a numpy method that handles all of data. You may speed up the loop a bit by precalculating pieces of it.

CollectivesTM on Stack Overflow

Vectorization of power law fitting

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions