We have two lists:
l=["a","b","c"]
s=[["a","b","c"],
["a","d","c"],
["a-B1","b","c"],
["a","e","c"],
["a_2","c"],
["a","d-2"],
["a-3","b","c-1-1","d"]]
print l
print s
Now, I am try to see if each 2nd-level list of s
has fuzzy match to any of items in list l
:
matches=list()
matchlist2=list()
print s2
for i in range(len(s)):
matches.append([])
for j in range(len(s[i])):
for x in l:
if s[i][j].find(x)>=0:
print s[i][j]
matches[i].append(True)
break
else:
matches[i].append(False)
matchlist2.append(all(x for x in matches[i]))
print matches
print matchlist2
This gives me what was intended. But I am not happy with how it has so many loops. I am also working with pandas and if there is pandas solution that will be great to. In pandas, there are just two columns of two dataframes.
[[True, True, True], [True, False, True], [True, True, True], [True, False, True], [True, True], [True, False], [True, True, True, False]]
[True, False, True, False, True, False, False]
1 Answer 1
Extract this into functions
You are doing one well defined thing (finding fuzzy matches), so this should be at least one function, but I suggest more because you are returning two results.
Write automatic tests
Re-factoring is very hard without tests, re-running and checking the input manually is very time consuming, an automatic test runs in milliseconds.
Use the list comprehensions and zip
You are basically checking the single-dimensional list (xs
) against each row of the two-dimensional list (matrix
) and seeing which elements are contained and which are not. This can be expressed very easily in code.
def fuzzy_matches(xs, matrix):
"""
>>> list(fuzzy_matches(["a","b","c"], [["a", "b43", "x"], ["w", "b", "5cfg"]]))
[[True, True, False], [False, True, True]]
"""
return ([i in j for i, j in zip(xs, line)] for line in matrix)