Password generators are extremely popular on CodeReview with beginners to both coding in general and Python in particular, for a number of reasons:
- They're (seemingly) easy to implement
- They offer an excuse to learn a little bit of the Python built-in API
- They are ostensibly useful in day-to-day computing
The code in this question is written as a straw man to capture (most of) the (significant) issues commonly exhibited by beginner password generator code.
The code
import random
import time
import tkinter
LOWER_LETTERS = [
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
]
UPPER_LETTERS = [
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
]
NUMBERS = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
SYMBOLS = ['!', '@', '#', '$', '%', '^', '&', '*', '(', ')']
def generate_password(length: int = 8) -> str:
"""
Generate a password string of length length, with characters pulled randomly
from the four category lists above.
"""
if not (4 <= length <= 16):
raise ValueError(f'Password length of {length} not in [4, 16]')
random.seed(time.time())
password = ''
i = 0
while i < length:
category_index = random.randint(a=0, b=3)
category = [LOWER_LETTERS, UPPER_LETTERS, NUMBERS, SYMBOLS][category_index]
char_index = random.randint(a=0, b=len(category)-1)
char = category[char_index]
password += char
i += 1
return password
def ask_user_terminal() -> None:
"""
Generate a new password via the terminal.
"""
length = int(input('Enter password length: '))
password = generate_password(length)
print('New password:', password)
while True:
confirm = input('Enter password to confirm: ')
if confirm == password:
break
def tkui() -> None:
"""
Generate a new password via a GUI.
"""
def replace_password() -> None:
label.configure(text=generate_password(length=8))
window = tkinter.Tk()
window.title('Password generator')
window.grid()
label = tkinter.Label()
label.grid(row=0)
tkinter.Button(
window, text='Generate', command=replace_password,
).grid(row=1)
window.mainloop()
if __name__ == '__main__':
ask_user_terminal()
tkui()
2 Answers 2
Character Sets
The code assumes you only want passwords using the Latin alphabet, the Arabic numerals, and some ASCII special characters. Being locked into the character set has three main problems:
Expanded Characters
With the near ubiquitous presence of Unicode, being limited to 72 characters of Unicode's 159,801 is needlessly limiting. For instance, by expanding to just CP-1252 we can, humorously, 1337 "Password" to Páßwörð".
The increase in the character set means a password cracker will have to perform more work to crack the password. As such you should allow the user to provide additional characters. There are also security consequences: some scripts such as Chinese have morphemes with much more entropy than the letters of the Latin alphabet.
Different Locale
With Unicode we can get a little silly with say "🙈🙉🙊👻🤡👺" as a password. Which should raise some eye brows, "what you can have an emoji password?!" Where the answer entirely depends on the authenticator the user will be interacting with. For instance a PIN is limited only to numbers.
Additionally the user of your application may be more familiar with different script(s) and character input methods. For instance "password" vs "пароль" vs "パスワード"; would you be able to remember and enter all three?
Defaulting to the user's expectation determined from the locale
is a good idea. Where all of ASCII's printable characters is a good fallback/option.
Disambiguation
Any time that a generated password needs to be read by human eyes, there is a risk of homoglyphs. These are visually similar characters that the user may confuse with each another, even if the computer knows better. Common examples are:
- Latin uppercase letter O
'\x4f'
and numeral 0'\x30'
- Latin uppercase letter I
'\x49'
, latin lowercase letter L'\x6c'
and numeral 1'\x31'
and so on. A good generator would have an opt-in mode to remove such homoglyphs from the character set, especially in generic password mode. Some nuance is called-for here: in passphrase mode, the randomly-generated passphrase
illegitimate 54321 INCREDIBLY LIMINAL
is visually unambiguous, and so sometimes, homoglyphs are OK.
As a more complex solution, there are other methods to distinguish homoglyphs:
- use a fixed-width font. We always get this for free in a terminal, and so
ask_user_terminal()
is somewhat safe, buttkui
is not and needs modification to use a fixed-width or monospace font forlabel
. - use other character styles like underlining or color to indicate case or numerals.
Secrets & Entropy
Types of Secrets
Typically users will interact with various types of secrets, for a password generator we mostly only care about two types.
Memorized Secrets: A secret the user is intended to memorize, such as: a passphrase, a password, or a PIN.
Look-Up Secrets: A secret the user is intended to store in a password manager.
The code is generating Look-Up Secrets.
Entropy
Lets say we use random.SystemRandom
to generate 16 bits of entropy, getting the number 42401. We can then convert the value into multiple forms:
Binary | Octal | Decimal | Base 64 |
---|---|---|---|
1010010110100001 | 122641 | 42401 | KWh |
Lets look at how we got "base 64" with the table "A-Za-z0-9+/".
Segment | Binary | Decimal | Character |
---|---|---|---|
1 | 100001 | 33 | h |
2 | 010110 | 22 | W |
3 | 1010 | 10 | K |
Since we know the secret and the algorithm to create the secret we can verify the entropy in bits. We picked from a table of 64 values three times, then convert to base 2 to determine the bit length:
$$ \log_2 64^3 = 3\log_2 64 = 3 \cdot 6 = 18 $$
Can you figure out where I mislead you and the 18 has come from?
Segment 3 is 1010 which is 4 bits long, not 6. 18 - 2 = 16.
Entropy can be hard to calculate because we don't always know the algorithm used to create the secret, and so can make wrong assumptions. However, from what we've seen already we can easily figure out two ways to maximise entropy:
- Character Set: Using more characters lets you store more entropy per character.
- Length: Even with 2 characters ("01") increasing the length increases the entropy.
As such Look-Up Secrets are the natural solution.
Memorized Secrets
The biggest downside to the natural solution is people are pretty bad at memorizing a string of purely random characters. So lets look at bad and good ways for humans to increase entropy.
Password
Which secret is easier to remember, a password "password" or a Look-Up Secret "xhwvoizl"? The password is much easier to remember, so lets compare the entropy. For simplicity lets pretend the dictionary is only 100,000 words.
$$ \begin{align} \log_2 100000 & = 16.61 \\ 3\log_2 26 & = 14.1 \\ 4\log_2 26 & = 18.8 \\ \end{align} $$
In the old days where you'd need a word to gain access to a back-ally establishment, 16.61 bits of entropy was pretty good. Now-a-days you can check 100,000 passwords in the matter of seconds.
Lets look at how people would typically implement the mandate a minimum of 1 number, 1 symbol, and 1 uppercase letter.
Unfortunately, many users will add complexity to their password by simply capitalizing the first letter of their password or adding a "1" or "!" to the end. And while it technically does make a password more difficult to crack, most password-crackers worth their salt know users tend to follow these patterns and can use them to reduce the time needed to decrypt a stolen password.
— NIST Password Guidelines and Best Practices for 2020
Ok so "password" becomes "Password1!" or "Password!1". Lets see the increase in bit-length when being generous.
If we randomly uppercase one of the letters in password we can nCk. Lets pretend every word in the dictionary is 8 large.
$$\log_2 \binom{8}{1} = 3$$
We can pick from 10 numbers.
$$\log_2 10 = 3.32$$
We can pick from 33 special characters.
$$\log_2 33 = 5.04$$
The number and special character can be in two positions.
$$\log_2 \binom{2}{1} = 1$$
Lets see how the recommendation affects the entropy when being generous.
$$ \begin{align} 16.61 + 3 + 3.32 + 5.04 + 1 & = 28.97 \\ \log_2 100,000 \binom{8}{1} \cdot 10 \cdot 33 \cdot \binom{2}{1} & = 28.98 \\ \end{align} $$
As we can see reasoning with what does and doesn't increase entropy can be simplified into adding different entropy together. The difference is just a rounding error.
Rather than being generous we can see the increase from the typical way people change the password:
$$ \begin{align} \log_2 1 & = 0 \text{ (for cap)}\\ \log_2 1 & = 0 \text{ (for 1)}\\ \log_2 1 & = 0 \text{ (for !)}\\ \log_2 \binom{2}{1} & = 1 \text{ (for 1! or !1)}\\ \end{align} $$
Ok, so 1 bit of extra entropy in the common case is really not a good algorithm. But more importantly shows 28.98 is just the best case of the algorithm; as some outputs, such as "Password1!", have no tangible benefit.
As such your password generator should estimate the entropy of the output to ensure the output is meeting a high enough bar. Even with a good algorithm you can produce a common password or pattern.
Passphrase
Passphrases in particular are lower-entropy per character than fully-random passwords, but are generally easier to remember, and so have several practical advantages when compared to passwords. They were made more popular by XKCD's illustration of the approach and its implementation in correcthorsebatterystaple.net, and are now supported in many free generators.
As we saw from the previous example the biggest increase in bits of entropy was the 16.61. We can very easily just 4x our entropy by generating four words. And is much easier to remember than a random assortment of lowercase letters.
$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 14 \log_2 26 & = 65.81 \\ 15 \log_2 26 & = 70.51 \\ \end{align} $$
As such the password generator should have a human mode, in which you generate an n word passphrase. Adding a little flair through 1337, or other encodings, is okish depending on the user but should not be the backbone of a Memorable Secret algorithm due to the low increase in entropy.
Look-Up Secrets
When building a password generator generating a bunch of random characters from a character set is normally the default. So how well does the algorithm hold up? Since Unicode has 159,801 characters, lets focus on ASCII with 2*26 Latin letters, 10 Arabic numerals and 33 ASCII symbols for 95 in total.
$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 10 \log_2 95 & = 65.7 \\ 11 \log_2 95 & = 72.27 \\ \end{align} $$
How does the advice of forcing at least one lower, upper, number and special affect things?
- We have lower, \26ドル\$, being randomly inserted into \$\binom{1}{1}\$ slots.
- We have upper, \26ドル\$, being randomly inserted into \$\binom{2}{1}\$ slots.
- We have number, \10ドル\$, being randomly inserted into \$\binom{3}{1}\$ slots.
- We have special, \33ドル\$, being randomly inserted into \$\binom{4}{1}\$ slots.
We can simplify the binomials into a single product \$\prod_{i=1}^{n} i\$ where \$n\$ is the amount of 'buckets'. Lets contrast to the secrets we are securing against.
$$ \begin{align} 4\log_2 26 + 26 + 10 + 33 & = 26.28 \\ \log_2 26 \cdot 26 \cdot 10 \cdot 33 \cdot \prod_{i=1}^{4} i & = 22.35 \\ 26.28 - 22.35 & = 3.93 \\ \\ 4\log_2 10 & = 13.29 \\ 4\log_2 26 & = 18.80 \\ 4\log_2 33 & = 20.18 \\ \end{align} $$
The question now is "what max entropy is too low 13, 19, 20?" As the reduction from 26 to 22 is only a result of pre-filtering what you think is too low of an entropy.
As such what is good for a Memorable Secret generator can be harmful to a Look-Up Secret generator. You should think about both algorithms as wholely independent. However, just like with Passwords you should show an estimate of the entropy; as the user is the one who decides what entropy is too little.
Random generation
State and Seeds
As professionals we should strive for
defense in depth, and that means that
we must assume that an attacker may eventually gain some partial or full knowledge of the time at which the
password was generated. From the documentation for
time()
,
Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.
A password generator should not seed with a potentially coarse timestamp as the question code does.
If it did and an attacker determined or guessed the time of generation, no amount of password length
or character complexity will save you: the attacker can re-generate a password of 1,000 characters
identically to the original program. This problem can be avoided by not seeding at all, and relying instead on the implementation-defined seeding behaviour as wrapped by SystemRandom
. It should not and cannot be seeded at the Python level.
Probability
The question code assumes that the category selection probabilities should be
evenly distributed, but that has a side-effect: if we only allow one symbol - say,
an underscore - but have three categories, one third of the output characters will tend to be
underscores. Is that sensible? A solution that's both simpler and (depending on your
interpretation) produces a more expected probability distribution is to produce a superset
of ALL_CHARS
from the union of each character class set, and issue a single call to
choices(ALL_CHARS, k=n)
.
Random Data in Python
As a progression,
- replace
random.randint(a=0, b=2)
withrandom.randrange(3)
so that we can take advantage of half-open intervals; but really, - get rid of the individual-character
randint()
calls and the loop, and replace with a single, non-looped call tomy_system_random.choices(ALL_CHARS, k=n)
.
User interface
Usability
Don't use a Tk label. Instead, use an Entry
so that the user can copy the text. Again from the NIST summary, it encourages use of the clipboard:
Allow Password "Paste-In"
If passwords are easier to enter, your users are more likely to use a longer, more complex password in the first place (which is more secure). That’s where "paste-in" password functionality is now advantageous — if entering passwords is as simple as copying and pasting them into a password field; it encourages safer behavior.
This is especially important considering how many passwords the average person has to remember these days and the tools people are using to manage them all.
The Entry
should also be editable: in the end, the user (and not the program) is in the best position to understand whether the password complies with requirements of the target authentication entry system, so should be able to adjust accordingly.
Peek Safety
In ask_user_terminal()
's confirmation loop, the password is exposed to over-the-shoulder spying. In a crowded
café or airport someone may see it. We can limit this exposure by
- Initially showing the user the password,
- Prompting the user to continue once they have safely recorded or memorized the password,
- Clearing the screen, and then
- Using
getpass
instead ofinput
to collect the confirmation password.
getpass
configures the console to disable echo to
make such spying impossible. This approach is typical in the Linux/Unix world.
In the case of tkinter, create an Entry
with show='*'
. Make a Show button that, when depressed,
calls entry.configure(show='')
, and when released, calls entry.configure(show='*')
. This is also
peek-safe and is an easy and typical way to allow users to choose the
exact amount of exposure time.
Loose Coupling
Rather than calling label.configure(text=)
, associate label
(the user interface) with a
StringVar
(the data path). In the replace_password
handler, only interact with the StringVar
and not the label.
Performance
Performance is generally not a concern for applications of this kind. As a result, most attempts at making this application faster can be considered premature optimization. Most CPU-bound tasks in the application are so fast as to be imperceptible by the user. The only time they may become noticeable is if you intend to extend this application to generate millions of passwords (for a rainbow table, etc.).
SystemRandom
is a powerful high-level abstraction on os.urandom
which is itself an abstraction on the operating system's best source of secure random generation, often the Linux kernel's getrandom()
and its fallback the urandom
device, or Windows BCryptGenRandom. urandom
is fast:
A read from the
/dev/urandom
device will not block waiting for more entropy.
The priorities for this program should be correctness, security, maintainability and usability, not performance.
Sets in Python
As a progression, and assuming the continued use of the existing program's character sets,
- replace the
LOWER_LETTER
(etc.) lists[]
with immutable tuples()
since we never want those character sets to change; but really, - replace them with simple strings
'ABCDE...'
which are themselves immutable sequences; but really, - just
import
those from the strings module; and then, - cast to a frozenset
as in
ASCII_LETTERS = frozenset(string.ascii_letters)
.
Concatenation in Python
There is a lot of
discourse in the Python community
about the best way to concatenate strings. For various reasons, in-place +=
is not always a favourite.
In the context of this question, due to the API offered by
secrets.SystemRandom().choices()
,
''.join(choices())
is the best option. The fact that this will scale linearly O(n) in time with the length of the
password is immaterial because password length is negligible from a performance perspective. Instead we prefer it
because it is the simplest and clearest method, and does not need an explicit loop.
Even if you were to keep an explicit loop, you would want to replace while i < n
with for _ in range(n)
to loop like a native.
-
\$\begingroup\$ @Reinderien Ok I rewrote the first half in a here's how to improve the code manner rather than "here's a bunch of links". I likely have removed some important information you put unintentionally; please just add back what you think is important. Since I started from scratch I added your stuff verbatim as otherwise following would be too hard in the diff. (I'd recommend using an external diff) Have I waffled on too much? Probably feel free to cut the cruft. \$\endgroup\$2025年09月19日 18:21:57 +00:00Commented Sep 19 at 18:21
The goal
I think most password generators are created as a learning exercise. They should probably stay a learning exercise. The idea that a user can easily create something that exceeds e.g. the open source KeePass
without extensive study of the subject is exceedingly small.
If the password generator is provided as part of a service, then the provider of the service should probably consider who is responsible for having a good password in the first place. It's the password of the user for the system, the provider doesn't "own" the password. It's of course a good idea to guide the user to using a good password, but generating one for the user might be a step too far.
The entropy
The random number generator
First of all, the random number generator should not be defined on time
. It should be a cryptographically secure pseudo-random number generator which is seeded by (or is) the system random number generator. In the following remarks on the code I'll assume a CSPRNG.
The size of the password
Passwords of 8 characters long are not recommended any more as they contain too little entropy. The default is therefore off. I'd rather have the user explicitly select a size, or have the UI define the default size instead, i.e. leave the default value out of the function declaration.
At the moment you'd expect 10 characters on the very low end and 12 characters on the higher end. PINs etc. of course will contain fewer characters, but those are combined with a maximum number of tries and probably other counter-measures.
Maximizing entropy
The randomness is not maximized. The choice is first made from the character sets, then the character is randomly chosen. However, some character sets are 26 characters, others only 10. It makes more sense to choose from the joined set of characters assuming that the current scheme is adhered to - i.e. the password should be fully random, as it currently isn't.
Per-character choice of character set
As the characters sets are chosen in advance it is very possible for the function to return a password with just digits. The chance of that happening is e.g. 1 in 4^4, i.e. 1 in 256 if the length is 4. This is definitely random so in a sense it is the most secure. However, it will be easily caught by a cracker that assumes a PIN to have been entered instead of a password.
It makes more sense to indicate a minimal number of characters from each set, and then shuffling the password afterwards. This is more likely to comply with certain password-policies that require you to use e.g. 2 digits and a symbol in your password.
The type of password
If the intent is to create easy-to-remember passwords then it makes more sense to choose from a set of words rather than a set of characters, as random characters are much harder to remember. As mentioned before, random characters are fine for storing passwords in a password safe like KeePass or an integrated password vault.
System interaction / the UI
Having the system generate a password in the first place
Generally the user already has a password generator tool as part of a password manager, either separate (KeePass), part of the system (Windows Credential Manager), connected (1Password) or as part of a browser.
These are vetted password managers that have been created by experts. They can often rely on system-provided methods for keeping passwords secure. Services should rely on these password managers rather than building their own.
Displaying the password in output / dialogue window
Currently no password-specific methods are used within the UI. This means that the password will be easy to retrieve from either the output or from the label by capturing the output or the screen in some way or other.
Quite often UIs have widgets specific for entering or displaying passwords. Those can of course be generated as well, e.g. by displaying passwords as pictures of characters only shown on hover or by pressing an "eye" button.
For the console version, one could consider storing the password in a given environment variable or even a file.
Connecting with a password vault
If at all possible it is better to store passwords directly in a password vault. Those generally already have password generator functions though. As such, this should be thought of more as an exercise than stand-alone code. That is true in general when it comes to password generation code or any cryptography-related code.
No feedback on the security of the password
Currently the user will never know how secure the password is after it has been generated. This kind of security should be indicated.
If the developer cannot indicate the amount of entropy that is expected to be in the password then the developer should probably see that as an indication that they require more education before writing or releasing a password generation function.
No iterations
Currently the alphabet (the characters that can be put into the password) is set, but as indicated, it may not comply with a password policy or it may not be pleasant for the user to enter. Generally password tools allow the user to choose whether to accept the password or generate a new one.
Of course this comes with a caveat: the user may generate passwords until they get a simple one to remember or type, but this is also one that may be easier to crack. I don't think there is much consensus of what to do here, though most password managers err towards letting the user decide whether to keep or not keep the password.
Hard to read passwords
Quite often password managers avoid letter / digit combinations that look too much like each other. The main culprit is probably 0
, o
and O
but other combinations are troublesome as well, e.g. 1
, i
, I
and l
. Font choice can of course help with that.
This is mainly an issue if the user has to copy the password from screen themselves. This should be avoided in the first place.
The code
Undocumented contract
The code is using an undocumented scheme, which cannot be altered from the outside. As such the scheme is likely not compliant with many password policies. See the last remark on the entropy how to possibly fix this.
The contract of the function is not well documented. There is a guard statement that the length
of the password should be between 4 and 16 characters, but this is not indicated in the documentation at all.
The output is not well documented either. The character classes are defined as constants, but they are not mentioned in the documentation. It doesn't document the entropy either, nor does it define the algorithm used.
Using a str
for a password
The disadvantage of using strings is that they cannot be easily erased. It makes more sense to store the passwords in e.g. a bytearray
that can be easily erased, right after the password is not required any more. If the password is generated for internal verification then it should be used as input for a Password Based Key Derivation Function (aka password hash) and erased right after the hash has been generated.
It is questionable whether a password generator should be written in Python in the first place, as it doesn't offer fine-grained memory or string control. Some system languages do offer that and may use the system to protect memory fields, e.g. by disallowing swapping specific memory regions to disk or even by encrypting memory regions.
The while loop
The while
loop is a for
loop in disguise. As it will always go through length
iterations it doesn't make sense to implement it as such.
Explore related questions
See similar questions with these tags.
if not (4 <= n <= 16)
" Even knowing it's a strawman, that upper bound is painful to read. \$\endgroup\$