For this code challenge, I want to create a performant function that creates a rectangular N-dimensional array. (min_dim_size, max_dim_size) is the span size of every dimension. So rectangular_properties is a list with sizes of every dimension of the matrix. The matrix contains random integer values. Also I want the matrix to be immutable on every dimension.
Now I have this code:
def random_hyperrectangular_matrix(dimensions, min_dim_size, max_dim_size):
rectangular_properties = [randint(min_dim_size, max_dim_size) for _ in range(dimensions)]
def recc2(dim):
if dim == 0:
return randint(0, 100)
return tuple([recc2(dim - 1) for _ in range(rectangular_properties[dim - 1])])
random_n_dimensional_matrix = recc2(dimensions)
return random_n_dimensional_matrix
When I run my code it seems that the generation of this matrix takes a huge amount of time. Is there a lack of performance in it? Does the generation of tuple([...]) out of a list takes much time?
1 Answer 1
For something like this, numpy
is probably unbeatable in terms of performance, since its underlying functions are implemented in C.
In your case the implementation would be rather easy:
import numpy as np
def random_hyperrectangular_matrix(dimensions, min_dim_size, max_dim_size):
shape = np.random.randint(min_dim_size, max_dim_size + 1, dimensions)
return np.random.randint(0, 101, shape)
Note the +1
on the upper limits, since np.random.randint
excludes the upper bound, whereas random.randint
includes it.
Note also that dimensions
is in this case limited to be less than or equal to 32 (because of limits in the internal implementation). Your algorithm will only fail once the recursion depth reaches the limit (1000 by default). But if you need more than 32 dimensions, you are probably doing something wrong, anyways...
This might also make the rest of your program (which you don't show) easier, because operations can act on each element of the array, or only certain dimensions.
With min_dim_size, max_dim_size = 2, 2
(the lowest practical numbers, and the same to eliminate one source of randomness in the timings) this consistently improves the timing (note the logarithmic scale on the y-axis):
Especially noteworthy is that at low dimensions, the numpy
approach is basically constant in time. For higher dimensions it scales the same as your approach, but consistently lower.