I generally find functions that modify a mutable item in-place and don't return anything confusing, it's usually a better idea to return the list object from the function and re-assign to the variable from where it was called.
def replace_matched_items(word_list, dictionary):
for lst in word_list:
for ind, item in enumerate(lst):
lst[ind] = dictionary.get(item, item)
list_ = replace_matched_items(list_, dictionary)
But the above approach is still flawed(at least when our object has multiple references), with this even though we re-assigned only a single variable all the other references to the list object will still be affected, which can be very confusing to users. For example:
def func(seq):
for lst in seq:
for i, x in enumerate(lst):
if x == 1:
lst[i] = 100
return seq
foo = [[1, 1, 2], [1, 2, 3]]
dct = {'key': foo}
foo = func(foo)
print foo
print dct
Outputs:
[[100, 100, 2], [100, 2, 3]]
{'key': [[100, 100, 2], [100, 2, 3]]}
To fix such such cases we can either pass a deepcopy of the list to the function(or simple shallow copy for list containing only immutable items) or we can create a new list inside of the function. But as we are dealing with 1 million items creating another list will surely result in extra memory consumption. Here's an example of how to do this by creating a new list using dict.get
:
def replace_matched_items(word_list, dictionary):
new_list = [[dictionary.get(item, item) for item in lst] for lst in word_list]
return new_list
list_ = replace_matched_items(list_, dictionary)
Few other questions on the same topic:
I generally find functions that modify a mutable item in-place and don't return anything confusing, it's usually a better idea to return the list object from the function and re-assign to the variable from where it was called.
def replace_matched_items(word_list, dictionary):
for lst in word_list:
for ind, item in enumerate(lst):
lst[ind] = dictionary.get(item, item)
list_ = replace_matched_items(list_, dictionary)
But the above approach is still flawed(at least when our object has multiple references), with this even though we re-assigned only a single variable all the other references to the list object will still be affected, which can be very confusing to users. For example:
def func(seq):
for lst in seq:
for i, x in enumerate(lst):
if x == 1:
lst[i] = 100
return seq
foo = [[1, 1, 2], [1, 2, 3]]
dct = {'key': foo}
foo = func(foo)
print foo
print dct
Outputs:
[[100, 100, 2], [100, 2, 3]]
{'key': [[100, 100, 2], [100, 2, 3]]}
To fix such such cases we can either pass a deepcopy of the list to the function(or simple shallow copy for list containing only immutable items) or we can create a new list inside of the function. But as we are dealing with 1 million items creating another list will surely result in extra memory consumption. Here's an example of how to do this by creating a new list using dict.get
:
def replace_matched_items(word_list, dictionary):
new_list = [[dictionary.get(item, item) for item in lst] for lst in word_list]
return new_list
list_ = replace_matched_items(list_, dictionary)
Few other questions on the same topic:
You're looping in incorrect order, take the advantage of the fact that dictionary provides O(1)
lookups. So, you should loop over the list and replace its items with the value present in the dictionary. This way you will loop over the list only once:
We can either use dict.get
here and avoid an if
condition:
def replace_matched_items(word_list, dictionary):
for lst in word_list:
for ind, item in enumerate(lst):
lst[ind] = dictionary.get(item, item)
Or check for the key first and modify only matched item(LBYL):
def replace_matched_items(word_list, dictionary):
for lst in word_list:
for ind, item in enumerate(lst):
if item in dictionary:
lst[ind] = dictionary[item]
or if you prefer EAFP then try to catch the KeyError
:
def replace_matched_items(word_list, dictionary):
for lst in word_list:
for ind, item in enumerate(lst):
try:
lst[ind] = dictionary[item]
except KeyError:
pass