I have a list of 9000 dictionaries and I am sending them to an API in batches of 100 (limit of the API). The API returns the list of 100 dictionaries just expanded with more key/value pairs. So both lists look something like this:
[
{Key1:Val1, Key2:Val2},
{Key1:Val3, Key2:Val4},
...
]
and returns:
[
{Key1:Val1, Key2:Val2,Key3:Val1, Key4:Val4},
{Key1:Val3, Key2:Val4,Key3:Val1, Key4:Val4},
...
]
Now, I have to create a list that has all 9000 returned dictionaries in them, because the original input receives a batch of 9000 so it needs to output them all at once as well. I have accomplished this with the following code:
dict_list = [This is the list with 9000 dicts]
batch_list = []
return_list = []
for i in dictlist:
batch_list.append(i)
if len(batch_list) == 100:
api_batch = API_CALL_FUNCTION(batch_list)
for j in api_batch:
return_list.append(j)
batch_list.clear()
else:
continue
if batch_list:
api_batch = API_CALL_FUNCTION(batch_list)
for k in api_batch:
return_list.append(k)
This code does what I want it to, but I really don't like the nested for loop and I'm sure there's probably a more efficient way to do this. Any suggestions?
2 Answers 2
You should just be able to append the returned API list directly to return_list:
dict_list = [This is the list with 9000 dicts]
batch_list = []
return_list = []
for i in dictlist:
batch_list.append(i)
if len(batch_list) == 100:
return_list.append(API_CALL_FUNCTION(batch_list))
batch_list.clear()
if batch_list:
return_list.append(API_CALL_FUNCTION(batch_list))
and your else clause is un-needed.
You should also explore slicing the dictlist instead of iterating through each one. You can call dictlist[0:100] and it will return a list containing the first 100 elements. dictlist[100:200] will return the next chunck, etc.
Hope this helped! Good luck.
There is a pretty definitive post by Ned Batchelder on how to chunk a list over on SO: https://stackoverflow.com/a/312464/4029014
The Python3 version looks like this:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
So you could process your list using this structure:
MAX_API_BATCH_SIZE = 100
for batch in chunks(dict_list, MAX_API_BATCH_SIZE):
batch_done = API_CALL_FUNCTION(batch)
Note that there is already a method on lists for concatenating a second list: it's extend
. So you can say:
return_list.extend(batch_done)
Your code is obviously example code, which is a violation of how CodeReview works (so this question probably should have been asked on SO directly). Regardless, it should be in a function either way:
MAX_API_BATCH_SIZE = 100
def process_records_through_api(records, batch_size=None):
""" Process records through the XYZ api. Return resulting records. """
batch_size = (MAX_API_BATCH_SIZE if batch_size is None or batch_size < 1
else batch_size)
result = []
for batch in chunks(records, batch_size):
result.extend(api_function(batch))
return result