Python parser script layout

Question 1

I'm writing a simple Python parser, where I loop over each line in a file, and prosess it further if the right conditions are met. My short start:

 def identify(hh_line):
 if(re.match(regex.new_round, hh_line)):
 m = re.match(regex.new_round, hh_line)
 # insert into psql
 ...
 if(re.match...

..and I was wondering what's the best way (practice) to approach this task, since this is the first time I write Python.

Thanks! =)

Question 2

First of all, it's redundant to run the match twice - instead, run it, store the result, and branch off of that:

m = re.match(regex.new_round, hh_line)
if m:
 # ...

Next, if you have a bunch of regex -> processing combinations, you might instead make a dict of regex -> function mappings, and then just iterate over it:

def process_a(data):
 # ...
def process_b(data):
 # ...
regex_to_process = {
 'regex_a': process_a,
 'regex_b': process_b,
}
for hh_line in <file object>:
 for regex,process in regex_to_process.iteritems():
 m = re.match(regex, hh_line)
 if m:
 process(hh_line)

Question 3

Yes, I reckoned it was. =) Thanks!

Question 4

Thanks, that looks great - but just one follow-up: why can't I access m.group('title') in ex. in that loop? When I have defined lables in the regex.. but I can see them all using groupdict().

Question 5

You're using (?P<name>expression) syntax, correct? Not sure - could you show more code?

Question 6

That's correct. There is really nothing more to show, but the grouping is freaky. The first regex contains like 6-7 groups, all with lables. The second regex contains 3 groups, and when I try to print any higher that 3, it fails. Why?

Question 7

Well, do keep in mind that the loop contents are running for every regex - so if you try to look at a group that exists in one regex but not in another, it'll fail on the iteration that is for the second regex.

Amber 532k89 gold badges643 silver badges558 bronze badges · Accepted Answer · 2010-08-24 02:52:31Z

3

First of all, it's redundant to run the match twice - instead, run it, store the result, and branch off of that:

m = re.match(regex.new_round, hh_line)
if m:
 # ...

Next, if you have a bunch of regex -> processing combinations, you might instead make a dict of regex -> function mappings, and then just iterate over it:

def process_a(data):
 # ...
def process_b(data):
 # ...
regex_to_process = {
 'regex_a': process_a,
 'regex_b': process_b,
}
for hh_line in <file object>:
 for regex,process in regex_to_process.iteritems():
 m = re.match(regex, hh_line)
 if m:
 process(hh_line)

Share

Improve this answer

answered Aug 24, 2010 at 2:52

Amber's user avatar

Amber

532k89 gold badges643 silver badges558 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Lasse A Karlsen

Lasse A Karlsen Over a year ago

Yes, I reckoned it was. =) Thanks!

2010年08月24日T02:55:22.213Z+00:00

Lasse A Karlsen

Lasse A Karlsen Over a year ago

Thanks, that looks great - but just one follow-up: why can't I access m.group('title') in ex. in that loop? When I have defined lables in the regex.. but I can see them all using groupdict().

2010年08月24日T03:31:16.17Z+00:00

Amber

Amber Over a year ago

You're using (?P<name>expression) syntax, correct? Not sure - could you show more code?

2010年08月24日T03:42:59.08Z+00:00

Lasse A Karlsen

Lasse A Karlsen Over a year ago

That's correct. There is really nothing more to show, but the grouping is freaky. The first regex contains like 6-7 groups, all with lables. The second regex contains 3 groups, and when I try to print any higher that 3, it fails. Why?

2010年08月24日T03:50:26.003Z+00:00

Amber

Amber Over a year ago

Well, do keep in mind that the loop contents are running for every regex - so if you try to look at a group that exists in one regex but not in another, it'll fail on the iteration that is for the second regex.

2010年08月24日T04:13:17.403Z+00:00

CollectivesTM on Stack Overflow

Python parser script layout

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related