\$\begingroup\$
\$\endgroup\$
7
I need to iterate through a file of words and perform tests on them.
import multiprocessing
import datetime
class line_Worker(object):
def worker(self, chunks):
for l in chunks:
num = 0
for num in range(100):
print l, num
return
if __name__ == '__main__':
lines = [line.rstrip('\n') for line in open('words.txt')] #opens file and saves to list
chunkNum = len(lines) / 26
wordLists = [lines[x:x + chunkNum] for x in range(0, len(lines), chunkNum)]#divides list by 4 and saves to list of lists
jobs = []
timeStart = str(datetime.datetime.now())
for i in range(27):
p = multiprocessing.Process(target=line_Worker().worker, args=(wordLists[i],))
jobs.append(p)
p.start()
for p in jobs:
p.join() # wait for the process to finish
timeStop = str(datetime.datetime.now())
print timeStart
print timeStop
I have a problem set of 35,498,500 individual lines, and I need to try to get it to run in roughly 3 minutes. The current run times are 16 minutes.
Is there any way to speed this up? Thanks all!
asked Mar 22, 2017 at 0:19
-
6\$\begingroup\$ If you are trying to be performant, why are you printing? \$\endgroup\$Stephen Rauch– Stephen Rauch2017年03月22日 01:42:57 +00:00Commented Mar 22, 2017 at 1:42
-
1\$\begingroup\$ Beside the obvious performance issue when you print, you're printing in a multi-threaded setting. That means that there is no way to determine the order in which stuff appears on screen. Is that fine? \$\endgroup\$ChatterOne– ChatterOne2017年03月22日 14:00:31 +00:00Commented Mar 22, 2017 at 14:00
-
\$\begingroup\$ Have you measured which part of those 16 mins is spent in the reading in of the words.txt file? \$\endgroup\$Graipher– Graipher2017年03月23日 09:45:43 +00:00Commented Mar 23, 2017 at 9:45
-
\$\begingroup\$ Its negligible sub 1 second, Im gonna try taking the print statements out and see what happens. I will post back. \$\endgroup\$Kyle Sponable– Kyle Sponable2017年03月23日 15:48:15 +00:00Commented Mar 23, 2017 at 15:48
-
\$\begingroup\$ The purpose of the script is to demonstrate what a dictionary attack would look like to illustrate an internet safety demo \$\endgroup\$Kyle Sponable– Kyle Sponable2017年03月23日 15:50:55 +00:00Commented Mar 23, 2017 at 15:50
1 Answer 1
\$\begingroup\$
\$\endgroup\$
Getting rid of the print statement fixed it thanks all!
import multiprocessing
import datetime
class line_Worker(object):
def worker(self, chunks):
for l in chunks:
num = 0
for num in range(100):
print l, num#clearing this made the time 4 seconds thanks all!!
return
if __name__ == '__main__':
lines = [line.rstrip('\n') for line in open('words.txt')] #opens file and saves to list
chunkNum = len(lines) / 26
wordLists = [lines[x:x + chunkNum] for x in range(0, len(lines), chunkNum)]#divides list by 4 and saves to list of lists
jobs = []
timeStart = str(datetime.datetime.now())
for i in range(27):
p = multiprocessing.Process(target=line_Worker().worker, args=(wordLists[i],))
jobs.append(p)
p.start()
for p in jobs:
p.join() # wait for the process to finish
timeStop = str(datetime.datetime.now())
print timeStart
print timeStop
answered Mar 28, 2017 at 22:45
lang-py