Streamlining series of try-except + if-statements for faster loop processing in Python

Question 1

I'm processing strings using regexes in a bunch of files in a directory. To each line in a file, I apply a series of try-statements to match a pattern and if they do, then I transform the input. After I have analyzed each line, I write it to a new file. I have a lot of these try-else followed by if-statements (I only included two here as an illustration). My issue here is that after processing a few files, the script slows down so much that it almost stalls the process completely. I don't know what in my code is causing the slowing down but I have a feeling it is the combination of try-else + if-statements. How can I streamline the transformations so that the data is processed at a reasonable speed?

Or is it that I need a more efficient iterator that does not tax memory to the same extent?

Any feedback would be much appreciated!

import re
import glob
fileCounter = 0 
for infile in glob.iglob(r'\input-files\*.txt'):
 fileCounter += 1
 outfile = r'\output-files\output_%s.txt' % fileCounter
 with open(infile, "rb") as inList, open(outfile, "wb") as outlist:
 for inline in inlist:
 inword = inline.strip('\r\n')
 #apply some text transformations
 #Transformation #1
 try: result = re.match('^[AEIOUYaeiouy]([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy](.*\[=\].*)*', inword).group()
 except: result = None
 if result == inword:
 inword = re.sub('(?<=^[AEIOUYaeiouy])(?=([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy])', '[=]', wbWord)
 #Transformation #2 etc.
 try: result = re.match('(.*\[=\].*)*(\w?\w?)[AEIOUYaąeęioóuy]\[=\][ćsśz][ptkbdg][aąeęioóuyrfw](.*\[=\].*)*', inword).group()
 except: result = None
 if result == inword: 
 inword = re.sub('(?<=[AEIOUYaąeęioóuy])\[=\](?=[ćsśz][ptkbdg][aąeęioóuyrfw])', '', inword)
 inword = re.sub('(?<=[AEIOUYaąeęioóuy][ćsśz])(?=[ptkbdg][aąeęioóuyrfw])', '[=]', inword)
 outline = inword + "\n"
 outlist.write(outline)
 print "Processed file number %s" % fileCounter 
print "*** Processing completed ***"

Question 2

It's not 100 % clear what exceptions you expect, but I presume you are trying to handle the case when the regex does not match. I suggest handling it this way:
```
 #Transformation #1
 match = re.match(pattern, inword)
 result = match.group() if match else None
```
"after processing a few files, the script slows down" Have you considered the possibility that a particular file, or even a particular line is slow to process? A possible explanation to that would be that regexes can suffer from catastrophic backtracking.

Janne Karila 10.7k21 silver badges34 bronze badges · Accepted Answer · 2017-09-11 07:38:48Z

It's not 100 % clear what exceptions you expect, but I presume you are trying to handle the case when the regex does not match. I suggest handling it this way:
```
 #Transformation #1
 match = re.match(pattern, inword)
 result = match.group() if match else None
```
"after processing a few files, the script slows down" Have you considered the possibility that a particular file, or even a particular line is slow to process? A possible explanation to that would be that regexes can suffer from catastrophic backtracking.

Stack Exchange Network

Streamlining series of try-except + if-statements for faster loop processing in Python

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Streamlining series of try-except + if-statements for faster loop processing in Python

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions