1

I am iterating through a for loop looking for keyword matches in a list and then compiling the match indices to a third list. I can compile the indices as a list of lists, but I want to further group sub-lists by the item they matched.

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
 for i in my_list:
 for m in re.finditer(pat, i):
 a =list((m.start(),m.end()))
 indices.append(a)
print(indices)

This returns:

[[0, 2], [0, 2], [1, 3]] 

Trying to get:

[[0, 2], [[0, 2], [1, 3]]]

so that it is clear that:

[[0, 2], [1, 3]]

are indices matches on 'cde' in the example above.

asked Mar 27, 2013 at 9:22
1
  • list((m.start(),m.end())) is normally spelled [m.start(), m.end()]. Commented Mar 27, 2013 at 12:20

2 Answers 2

2

Make indices a dict:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices = {}
pats = [re.compile(i) for i in keywords]
for pat in pats:
 for i in my_list:
 indices.setdefault(i, [])
 for m in re.finditer(pat, i):
 a = list((m.start(),m.end()))
 indices[i].append(a)
print(indices)

Giving:

{'cde': [[0, 2], [1, 3]], 'ab': [[0, 2]]}

Is this what you're looking for?

I played with this code for a while and since you import itertools you might as well use it to get rid off those ugly nested fors ;) like that:

import re
from itertools import product
my_list = ['ab', 'cde']
keywords = ['ab', 'cd', 'de']
indices = {}
pats = [re.compile(i) for i in keywords]
for i, pat in product(my_list, pats):
 indices.setdefault(i, [])
 for m in re.finditer(pat, i):
 indices[i].append((m.start(), m.end()))
print(indices)

Unfortunately I can't get Bakuriu's idea to use list comprehension to work properly. So for now this seems like the best solution to me.

answered Mar 27, 2013 at 9:27
4
  • A great idea to resolve this with dicts: much better than my ugly nests. Thank you! Commented Mar 27, 2013 at 11:05
  • Since you are new to SO: welcome here. And tip: when your question is answered the way that helps you, you can tick the best answer to indicate that question is answered and to award the person who answered with some reputation points ;) Commented Mar 27, 2013 at 11:10
  • Piotr thanks again. I need a score of 15 to vote up, but will as soon as I get there. Commented Mar 27, 2013 at 13:45
  • I know, but you don't need more rep to accept an answer (that's a big tick under votes for answer). Cheers! ;) Commented Mar 27, 2013 at 13:59
0

Create a list for each match and accumulate the matches in this list, finally add it to the result:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
 for i in my_list:
 sublist = []
 for m in re.finditer(pat, i):
 a =list((m.start(),m.end()))
 sublist.append(a)
 indices.append(sublist)
print(indices)

Or you could use a list-comprehension:

import re, itertools
my_list = ['ab','cde']
keywords = ['ab','cd','de']
indices=[]
pats = [re.compile(i) for i in keywords]
for pat in pats:
 for i in my_list:
 sublist = [(m.start(), m.end()) for m in re.finditer(pat, i)]
 indices.append(sublist)
print(indices)
answered Mar 27, 2013 at 9:33
1
  • So much more readable, especially the list comp solution. Thank you. Commented Mar 27, 2013 at 13:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.