1
\$\begingroup\$

I need to make a function like these two, but at least an order of magnitude faster in python. Any tips or tricks?

These functions inputs are nested lists, like: request_parameters = [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, [110, 111, 112, 113, 114]] Their output should be: [[1, 2, 3, 4, 5, 110], [1, 2, 3, 4, 5, 111], [1, 2, 3, 4, 5, 112], [1, 2, 3, 4, 5, 113], [1, 2, 3, 4, 5, 114]]

OR a numpy.ndarray:

[[ 1 2 3 4 5 110]
 [ 1 2 3 4 5 111]
 [ 1 2 3 4 5 112]
 [ 1 2 3 4 5 113]
 [ 1 2 3 4 5 114]]

A unit test is on the way...

input:

request_parameters = [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, [110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180]]

Output:

[[1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 110]
...
...
...
[1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 180]]

import numpy as np
request_parameters = [1, 2, 3, 4, 5, [10, 11, 12, 13, 14]]
def transform_nested_list(request_parameters: list) -> np.ndarray: 
 """
 Generate np.ndarray from a nested list
 """
 abc_ids = np.asarray(request_parameters[-1])
 #print(abc_ids)
 all_except_id = np.asarray(request_parameters[0:-1])
 #print(all_except_id)
 all_subrequest = np.hstack([all_except_id, abc_ids[0]])
 for element in abc_ids[1:]:
 row = np.hstack([all_except_id, element])
 all_subrequest = np.vstack([all_subrequest, row])
 #print(all_subrequest)
 #print(type(all_subrequest))
 return (all_subrequest)
import copy
def old_transform_nested_list(request_parameters: list) -> list:
 abc_ids = np.asarray(request_parameters[-1])
 all_except_id = (request_parameters[0:-1])
 request_list = []
 subrequest = [all_except_id]
 for single_id in abc_ids:
 #print(single_id)
 subrequest = copy.deepcopy(all_except_id)
 subrequest.append(single_id.astype(int))
 request_list.append(subrequest)
 #print(request_list)
 #print(type(request_list))
 return request_list
import datetime
loop = 10000
start = datetime.datetime.now()
for i in range(loop):
 transform_nested_list(request_parameters)
end = datetime.datetime.now()
print((end-start).total_seconds())
start = datetime.datetime.now()
for i in range(loop):
 old_transform_nested_list(request_parameters)
end = datetime.datetime.now()
print((end-start).total_seconds())

> 1.111902, 0.886634 [Finished in 2.2s]

Got a bit better results, but still not good enough:

import itertools
def flatten(container):
 for i in container:
 if isinstance(i, (list,tuple)):
 for j in flatten(i):
 yield j
 else:
 yield i
def old_tuple_transform_nested_list(request_parameters: list) -> list:
 abc_ids = np.asarray(request_parameters[-1])
 all_except_id = (request_parameters[0:-1])
 request_list = []
 subrequest = tuple(all_except_id)
 for single_id in abc_ids:
 toupled_subrequest = tuple([subrequest, [single_id]])
 chain = list(itertools.chain(*toupled_subrequest))
 subrequest_list = list(flatten(chain))
 request_list.append(subrequest_list)
 #print(request_list)
 #print(type(request_list))
 return request_list

A better version for longer lists:

def longer_vector_transformrequest(requestparameters: list) -> list:
 ids = np.asarray(requestparameters[-1]) 
 allexceptid = np.asarray(requestparameters[0:-1])
 subrequest = np.append(allexceptid, 0)
 allsubrequest = np.empty((len(ids),len(requestparameters)))
 for i in range(len(ids)):
 subrequest[-1] = ids[i]
 allsubrequest[i] = subrequest 
 return allsubrequest
asked Jul 9, 2019 at 9:40
\$\endgroup\$
4
  • \$\begingroup\$ Could you explain what these methods do? Could you provide some unit tests? \$\endgroup\$ Commented Jul 9, 2019 at 10:47
  • 1
    \$\begingroup\$ Sure, They take a nested list - as above noted - request_parameters And return an other nested list: [[1, 2, 3, 4, 5, 10], [1, 2, 3, 4, 5, 11], [1, 2, 3, 4, 5, 12], [1, 2, 3, 4, 5, 13], [1, 2, 3, 4, 5, 14]] or an numpy.ndarray [[ 1 2 3 4 5 10] [ 1 2 3 4 5 11] [ 1 2 3 4 5 12] [ 1 2 3 4 5 13] [ 1 2 3 4 5 14]] This is why i left all the print functions in my function in comments so with the input you can easily test. However, I am going to cover this with a unite test in a minute. \$\endgroup\$ Commented Jul 9, 2019 at 10:49
  • \$\begingroup\$ To make it easier to understand that is why i usually use typehints \$\endgroup\$ Commented Jul 9, 2019 at 10:55
  • \$\begingroup\$ Yeh, going to update in a sec \$\endgroup\$ Commented Jul 9, 2019 at 12:10

1 Answer 1

1
\$\begingroup\$

First and foremost: Never use datetime to measure performance! There is timeit and profile/cProfile for that. With that out of the way, let's talk about NumPy.

Python in general does not like loops if you want to go fast. NumPy can help you here since a lot of the heavy lifting can be done by the C backend where loops are orders of magnitudes faster. But you have to allow it to play its strengths. Broadcasting and slicing to the rescue!

def slice_transformrequest(request_parameters: list):
 ids = request_parameters[-1]
 allexceptid = request_parameters[:-1]
 # create an empty array of the appropriate size
 transformed = np.empty((len(ids), len(allexceptid)+1))
 # fill all but the ID column
 transformed[:, :-1] = allexceptid
 # now fill the ID column
 transformed[:, -1] = ids
 return transformed

Now to the timing part:

import timeit
n_loops = 10000
request_parameters = [
 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15,
 [
 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,
 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161,
 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174,
 175, 176, 177, 178, 179, 180
 ]
]
start = timeit.default_timer()
for i in range(n_loops):
 transform_nested_list(request_parameters)
end = timeit.default_timer()
print(f"transform_nested_list: {end-start:.3f}s")
start = timeit.default_timer()
for i in range(n_loops):
 old_transform_nested_list(request_parameters)
end = timeit.default_timer()
print(f"old_transform_nested_list: {end-start:.3f}s")
start = timeit.default_timer()
for i in range(n_loops):
 longer_vector_transformrequest(request_parameters)
end = timeit.default_timer()
print(f"longer_vector_transformrequest: {end-start:.3f}s")
start = timeit.default_timer()
for i in range(n_loops):
 slice_transformrequest(request_parameters)
end = timeit.default_timer()
print(f"slice_transformrequest: {end-start:.3f}s")

And the results:

transform_nested_list: 11.231s
old_transform_nested_list: 8.375s
longer_vector_transformrequest: 0.623s
slice_transformrequest: 0.140s

About 4x faster than your fastest solution and about 80x faster than transform_nested_list.

answered Jul 9, 2019 at 12:24
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.