I have this code that creates unique passwords using the first letter of each word from the file. Before each password is created (written to a file) it is compared to all passwords that are currently in the file, so if the file has 50,000 passwords before a another is written it has to scan through all 50k. A user can input any amount of passwords needed but the bigger the number the longer it takes. 100k will take a long time, so how can I optimize this to make the program run faster? (the password generation is not included code)
for mainLoop in range(passnum):
user = 0
newpass = generatePassword() # New password generated each iteration of loop
storePass = open("MnemPass.txt","a")
print ("PASSWORD GENERATED ")
#Checks if file is empty if True write first password
fileEmpty = os.stat("MnemPass.txt").st_size == 0
if fileEmpty == True:
storePass.write(newpass)
storePass.write("\n")
storePass.close()
print ("FIRST PASSWORD WRITTEN")
#IF file is not empty Read line by line and compare each with new password generated returns boolean value
elif fileEmpty == False:
storePass = open("MnemPass.txt","r")
with open("MnemPass.txt") as f:
fileData = f.read().splitlines()
linelength = len(fileData).__int__()
filemax = linelength
num = linelength #Number used to cycle through array set to size of list
#print(linelength+10)
for iterate in range(linelength):
num = num - 1 #Number decreases each pass
#print num
if fileData[num] != newpass: # The last element in the array is checked first decrementing each pass
go = True
if fileData[num]==newpass: #changed
print ("match found: PASSWORD : "+fileData[num])
passMatch = open("Matchpassword.txt","a")
passMatch.write(newpass)
passMatch.write("\n")
passMatch.close()
go = False
break
#places new password once it does not match and all elements prev passwords are compared
if go == True and num == 0:
storePass = open("MnemPass.txt","a")
storePass.write(newpass)
storePass.write("\n")
storePass.close()
print ("NEW WRITTEN")
if linelength == filemax:
num = num -1
2 Answers 2
user
is redundant in your code, it's instantiated but never used at all so it should be removed. Another more wasteful redundancy is printing all the time, that actually slows down the code a surprising amount. You should strip them out. At most print once per loop and for exceptional circumstances (like errors).
Instead of checking if the file is empty it would be better test the lines in it and then afterwards write the password to the file. This way you write the password in the same way whether or not the file is empty, as it will just immediately find that the password isn't in the file anyway.
For the main body, you're reading in all the lines, then looping backwards over them by index to test them backwards. Is there a reason you're doing it backwards? I can't tell why you would do that. There's a lot of problems here, I would actually just start from scratch.
So, you need to test whether a password exists in the file, right? I'd make this a function first of all. So the function needs to take the password as an argument.
def exists(password):
Now we need to open the file and read each line in the file. Rather than read all the lines up front, it's much easier to read lines one by one. Python makes this incredibly easy, letting you iterate directly through the file, returning one line at a time.
with open("MnemPass.txt") as f:
for line in f:
Now, we can just test if the password is there and return True
if it does, as this indicates that the password already exists.
if password == line.strip():
return True
I added .strip()
to remove whitespace that might mess with the test. And then of course, if the whole file is read without this problem then you eventually reach the end of the function where you return False
, indicating that it does not exist.
def exists(password):
with open("MnemPass.txt") as f:
for line in f:
if password == line.strip():
return True
return False
Now. when we go back to the original code this makes things immensely tidier:
for mainLoop in range(passnum):
newpass = generatePassword() # New password generated each iteration of loop
if exists(newpass): #changed
with open("Matchpassword.txt", "a") as passMatch:
passMatch.write(newpass + "\n")
else:
with open("MnemPass.txt", "a") as storePass:
storePass.write(newpass + "\n")
Now, this can be done even faster. If you were to go back to the function, you could use the any
operator to lake the for
loop and turn it into a generator expression. Generator expressions are basically like for loops condensed into a single expression. And if you pass one into any
it can run a test on every element of a loop. It's faster than normal Python loops. So here's how it'd look:
def exists(password):
with open("MnemPass.txt") as f:
return any(password == line.strip() for line in f)
This means that if any
ever finds a line that equals password
then it will return True
, but if it finds no such line it will return False
.
After this, optimisation really ought to be done with storage. You could sort the passwords alphabetically and then intelligently detect where they should be. Maybe store in multiple files, corresponding to the starting character of the password. There's only so much that the exact code optimisation itself can do.
-
\$\begingroup\$ Ok thanks for the break down/advice i will go through the code again. I got a lot of index array out of bounds issues when searching the front of the list starting from the end of the list solved this \$\endgroup\$user97644– user976442015年10月20日 17:40:09 +00:00Commented Oct 20, 2015 at 17:40
-
\$\begingroup\$ Hm, hard to know why that was happening without seeing the code, but you definitely wouldn't get it using
for line in file
. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月20日 20:06:14 +00:00Commented Oct 20, 2015 at 20:06 -
\$\begingroup\$ okay so i used the set function to instead of doing all that above i found that using that function sped the program up a lot it would take 50 seconds to create 10k passwords even if i was using your optimized code, but using the set function it takes less that 5 seconds i will modify the original post above \$\endgroup\$user97644– user976442015年10月21日 19:03:09 +00:00Commented Oct 21, 2015 at 19:03
-
\$\begingroup\$ Ah, if you have a new version then the Code Review approach is to ask a new question with that code, and just link back to this one explaining that it's a follow up question. Please see what you may and may not do after receiving answers . I've already edited your question to return it to the pre-edit state. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月22日 11:45:48 +00:00Commented Oct 22, 2015 at 11:45
I have some issues with the logic in your code:
- Multiple new passords, means multiple reads – It seems like it would be wiser to build a list of the new passwords, and then do a single run through the previous password file
- Appending to the file whilst reading the file – Your code will open the
MnemPass.txt
for appending, whilst in the loop reading the file. That is kind of bad... Yet again, I would append new passwords to a list, and do a single append after completing reading the file General file handling – Instead of doing all of that file mumble jumble, use an iterator along side with an enumerator if you need the line number for other purposes:
with open("MnemPass.txt") as f: for i, line in enumerate(f):
This will simplify your loop quite a bit, and make it easier to understand
- ( Consider reading the entire file into a list – As long as we're only talking about a 100K entries, it might be viable to read the entire list into memory, and use various list methods/tricks to check if it's already present. Do this with care, as the list might extend even further and you could get memory issues )
A final important suggestion, I would strongly consider splitting up your into multiple function. Now you have code which does:
- Generate new password
- Create a new file, if empty
- Loop through file to check for existing passwords
- If found, write match to another file
- If not found, append new password
Each of these are worthy their own function.
Explore related questions
See similar questions with these tags.
passnum
? \$\endgroup\$