I'm writing a simple Python parser, where I loop over each line in a file, and prosess it further if the right conditions are met. My short start:
def identify(hh_line):
if(re.match(regex.new_round, hh_line)):
m = re.match(regex.new_round, hh_line)
# insert into psql
...
if(re.match...
..and I was wondering what's the best way (practice) to approach this task, since this is the first time I write Python.
Thanks! =)
asked Aug 24, 2010 at 2:48
Lasse A Karlsen
8215 gold badges14 silver badges23 bronze badges
1 Answer 1
First of all, it's redundant to run the match twice - instead, run it, store the result, and branch off of that:
m = re.match(regex.new_round, hh_line)
if m:
# ...
Next, if you have a bunch of regex -> processing combinations, you might instead make a dict of regex -> function mappings, and then just iterate over it:
def process_a(data):
# ...
def process_b(data):
# ...
regex_to_process = {
'regex_a': process_a,
'regex_b': process_b,
}
for hh_line in <file object>:
for regex,process in regex_to_process.iteritems():
m = re.match(regex, hh_line)
if m:
process(hh_line)
answered Aug 24, 2010 at 2:52
Amber
532k89 gold badges643 silver badges558 bronze badges
Sign up to request clarification or add additional context in comments.
5 Comments
Lasse A Karlsen
Yes, I reckoned it was. =) Thanks!
Lasse A Karlsen
Thanks, that looks great - but just one follow-up: why can't I access m.group('title') in ex. in that loop? When I have defined lables in the regex.. but I can see them all using groupdict().
Amber
You're using
(?P<name>expression) syntax, correct? Not sure - could you show more code?Lasse A Karlsen
That's correct. There is really nothing more to show, but the grouping is freaky. The first regex contains like 6-7 groups, all with lables. The second regex contains 3 groups, and when I try to print any higher that 3, it fails. Why?
Amber
Well, do keep in mind that the loop contents are running for every regex - so if you try to look at a group that exists in one regex but not in another, it'll fail on the iteration that is for the second regex.
lang-py