Skip to main content
Code Review

Return to Question

edited tags; edited title
Link
BenC
  • 2.8k
  • 11
  • 22

streamlining Streamlining series of try-except + if-statements for faster loop processing in Python

Source Link

streamlining series of try-except + if-statements for faster loop processing in Python

I'm processing strings using regexes in a bunch of files in a directory. To each line in a file, I apply a series of try-statements to match a pattern and if they do, then I transform the input. After I have analyzed each line, I write it to a new file. I have a lot of these try-else followed by if-statements (I only included two here as an illustration). My issue here is that after processing a few files, the script slows down so much that it almost stalls the process completely. I don't know what in my code is causing the slowing down but I have a feeling it is the combination of try-else + if-statements. How can I streamline the transformations so that the data is processed at a reasonable speed?

Or is it that I need a more efficient iterator that does not tax memory to the same extent?

Any feedback would be much appreciated!

import re
import glob
fileCounter = 0 
for infile in glob.iglob(r'\input-files\*.txt'):
 
 fileCounter += 1
 outfile = r'\output-files\output_%s.txt' % fileCounter
 
 with open(infile, "rb") as inList, open(outfile, "wb") as outlist:
 for inline in inlist:
 inword = inline.strip('\r\n')
 
 #apply some text transformations
 #Transformation #1
 try: result = re.match('^[AEIOUYaeiouy]([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy](.*\[=\].*)*', inword).group()
 except: result = None
 
 if result == inword:
 inword = re.sub('(?<=^[AEIOUYaeiouy])(?=([bcćdfghjklłmnńprsśtwzżź]|rz|sz|cz|dz|dż|dź|ch)[aąeęioóuy])', '[=]', wbWord)
 
 #Transformation #2 etc.
 try: result = re.match('(.*\[=\].*)*(\w?\w?)[AEIOUYaąeęioóuy]\[=\][ćsśz][ptkbdg][aąeęioóuyrfw](.*\[=\].*)*', inword).group()
 except: result = None
 
 if result == inword: 
 inword = re.sub('(?<=[AEIOUYaąeęioóuy])\[=\](?=[ćsśz][ptkbdg][aąeęioóuyrfw])', '', inword)
 inword = re.sub('(?<=[AEIOUYaąeęioóuy][ćsśz])(?=[ptkbdg][aąeęioóuyrfw])', '[=]', inword)
 
 outline = inword + "\n"
 outlist.write(outline)
 
 print "Processed file number %s" % fileCounter 
print "*** Processing completed ***" 
lang-py

AltStyle によって変換されたページ (->オリジナル) /