memory leak with re.match

Mayling ge maylinge0903 at gmail.com
Tue Jul 4 05:01:08 EDT 2017


 Hi,
 My function is in the following way to handle file line by line. There are
 multiple error patterns defined and need to apply to each line. I use
 multiprocessing.Pool to handle the file in block.
 The memory usage increases to 2G for a 1G file. And stays in 2G even after
 the file processing. File closed in the end.
 If I comment out the call to re_pat.match, memory usage is normal and
 keeps under 100Mb.
 am I using re in a wrong way? I cannot figure out a way to fix the memory
 leak. And I googled .
 def line_match(lines, errors)
 for error in errors:
 try:
 re_pat = re.compile(error['pattern'])
 except Exception:
 print_error
 continue
 for line in lines:
 m = re_pat.match(line)
 # other code to handle matched object
 def process_large_file(fo):
 p = multiprocessing.Pool()
 while True:
 lines = list(itertools.islice(fo, line_per_proc))
 if not lines:
 break
 result = p.apply_async(line_match, args=(errors, lines))
 Notes: I omit some code as I think the significant difference is
 with/without re_pat.match(...)
 Regards,
 -Meiling


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /