0

I am moving a code base to run using PyTorch tensor on GPU. For loops are terrible on GPU especially for small-sized data. I am trying to vectorize the function below, i.e. to have no for loops.

I need help with Ds[] specifically.

Once I have the vectorized numpy version, converting to tensors should be straightforward.

import numpy
data = np.array( [ [ 11, 2, 3, 40, 5, 6, 7, 8, 9, 10 ] )
for i, xmin in enumerate( data[:-1] ):
 n = float( N - i )
 alpha = 1 + n / ( np.sum( log_data[i:] ) - n * log_data[i] )
 alphas[i] = alpha
 if alpha > 1:
 print( i, 1 - ( data[i:] / xmin ) ** ( -alpha + 1 ) )
 Ds[i] = np.max( np.abs( 
 1 - (data[i:] / xmin) ** (-alpha + 1) # Theoretical CDF\
 - np.arange( n ) / n # Actual CDF
 ))
sigmas = ( alphas - 1 ) / np.sqrt( N - np.arange( N - 1 ) )

I was trying to implement this step by step. But it seems the second term np.arange(n) / n is wrong

import numpy as np
def fit_power_law(data):
 # Assuming 'data' is already a NumPy array
 logdata = np.log(data)
 N = len(data)
 # Compute cumulative sums from the end
 sums = np.cumsum(logdata[::-1])[::-1]
 # Create array of n values
 ns = np.arange(N, 0, -1, dtype=np.float64)
 # Compute alphas
 alphas = 1 + ns / (sums - ns * logdata)
 # Filter alphas to only include those > 1
 valid_indices = alphas > 1
 filtered_alphas = alphas[valid_indices]
 filtered_data = data[valid_indices]
 M = len(filtered_data)
 # Create a matrix of NaNs
 matrix = np.full((M, M), np.nan)
 # Create an index array for broadcasting
 idx = np.arange(M)
 shifted_indices = idx[:, None] + idx
 # Mask to avoid out-of-bound indices
 mask = shifted_indices < M
 # Fill the matrix with the appropriate values
 matrix[mask] = filtered_data[shifted_indices[mask]]
 # Divide each row by the data array
 tt = matrix / filtered_data[:, np.newaxis]
 # Calculate the first term [1 - (data[i:] / xmin) ** (-alpha + 1)]
 first_term = 1 - tt ** (-filtered_alphas[:, np.newaxis] + 1)
 
 # Reverse the filtered data array
 data_reversed = filtered_data
 # Create a range array and a mask
 range_array = np.arange(int(data.shape[0])) 
 mask = range_array < data_reversed[:, np.newaxis]
 # Create the result array
 result = np.where(mask, range_array, np.nan)
 # Calculate the length of each inner array (excluding NaNs)
 lengths = np.sum(~np.isnan(result), axis=1)
 # Ensure lengths are at least 1 to avoid division by zero
 lengths = np.maximum(lengths, 1)
 # Broadcast lengths to match the shape of result for division
 lengths_broadcast = lengths[:, np.newaxis]
 # np.arange(n) / n #This is wrong
 second_term = np.where(~np.isnan(result), result / lengths_broadcast, np.nan)
 # Calculate the difference
 min_answer = first_term - second_term
 # Return the maximum absolute difference
 ds = np.nanmax(np.abs(min_answer), axis=1)[:-1]
 alphas = alphas[:-1]
 sigma = (alphas - 1) / np.sqrt(N - np.arange(N - 1, dtype=np.float64))
 ds_final = np.zeros(N-1, dtype=np.float64)
 ds_final[valid_indices] = ds
 return ds_final, alphas, sigma
asked Jul 23, 2024 at 17:11
4
  • 1
    Can you modify the code so it can be reproducible? Like for example initializing the required variables to random values (with the right distribution which certainly matters to know how many time the conditional if alpha > 1 is taken) using the right dtype and shape. Commented Jul 23, 2024 at 17:21
  • Note that for such a code, I expect tools like Cython or Numba to be the way to go as vectorizing this seems non-trivial and possibly inefficient in the end (for example, if the condition is often not taken). Can you use them and/or did you tried them? Commented Jul 23, 2024 at 17:27
  • @JérômeRichard I want to convert it to Pytorch and run it on GPU. Once I have the vectorized numpy version, converting to tensors should be straightforward. Commented Jul 23, 2024 at 17:29
  • You can't 'vectorize` the max on variable size arrays produced by np.abs( 1 - (data[i:] / xmin) ** (-alpha + 1) - np.arange(n) / n. ufunc like np.max have an accumulate method, but that takes an array argument, not a function. So you can't rewite that as a single call to a numpy method that handles all of data. You may speed up the loop a bit by precalculating pieces of it. Commented Jul 23, 2024 at 18:30

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.