Typical password generator in Python

Question 1

Password generators are extremely popular on CodeReview with beginners to both coding in general and Python in particular, for a number of reasons:

They're (seemingly) easy to implement
They offer an excuse to learn a little bit of the Python built-in API
They are ostensibly useful in day-to-day computing

The code in this question is written as a straw man to capture (most of) the (significant) issues commonly exhibited by beginner password generator code.

The code

import random
import time
import tkinter
LOWER_LETTERS = [
 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
]
UPPER_LETTERS = [
 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
]
NUMBERS = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
SYMBOLS = ['!', '@', '#', '$', '%', '^', '&', '*', '(', ')']
def generate_password(length: int = 8) -> str:
 """
 Generate a password string of length length, with characters pulled randomly
 from the four category lists above.
 """
 if not (4 <= length <= 16):
 raise ValueError(f'Password length of {length} not in [4, 16]')
 random.seed(time.time())
 password = ''
 i = 0
 while i < length:
 category_index = random.randint(a=0, b=3)
 category = [LOWER_LETTERS, UPPER_LETTERS, NUMBERS, SYMBOLS][category_index]
 char_index = random.randint(a=0, b=len(category)-1)
 char = category[char_index]
 password += char
 i += 1
 return password
def ask_user_terminal() -> None:
 """
 Generate a new password via the terminal.
 """
 length = int(input('Enter password length: '))
 password = generate_password(length)
 print('New password:', password)
 while True:
 confirm = input('Enter password to confirm: ')
 if confirm == password:
 break
def tkui() -> None:
 """
 Generate a new password via a GUI.
 """
 def replace_password() -> None:
 label.configure(text=generate_password(length=8))
 window = tkinter.Tk()
 window.title('Password generator')
 window.grid()
 label = tkinter.Label()
 label.grid(row=0)
 tkinter.Button(
 window, text='Generate', command=replace_password,
 ).grid(row=1)
 window.mainloop()
if __name__ == '__main__':
 ask_user_terminal()
 tkui()

Question 2

"if not (4 <= n <= 16)" Even knowing it's a strawman, that upper bound is painful to read.

Question 3

Character Sets

The code assumes you only want passwords using the Latin alphabet, the Arabic numerals, and some ASCII special characters. Being locked into the character set has three main problems:

Expanded Characters

With the near ubiquitous presence of Unicode, being limited to 72 characters of Unicode's 159,801 is needlessly limiting. For instance, by expanding to just CP-1252 we can, humorously, 1337 "Password" to Páßwörð".

The increase in the character set means a password cracker will have to perform more work to crack the password. As such you should allow the user to provide additional characters. There are also security consequences: some scripts such as Chinese have morphemes with much more entropy than the letters of the Latin alphabet.

Different Locale

With Unicode we can get a little silly with say "🙈🙉🙊👻🤡👺" as a password. Which should raise some eye brows, "what you can have an emoji password?!" Where the answer entirely depends on the authenticator the user will be interacting with. For instance a PIN is limited only to numbers.

Additionally the user of your application may be more familiar with different script(s) and character input methods. For instance "password" vs "пароль" vs "パスワード"; would you be able to remember and enter all three?

Defaulting to the user's expectation determined from the locale is a good idea. Where all of ASCII's printable characters is a good fallback/option.

Disambiguation

Any time that a generated password needs to be read by human eyes, there is a risk of homoglyphs. These are visually similar characters that the user may confuse with each another, even if the computer knows better. Common examples are:

Latin uppercase letter O '\x4f' and numeral 0 '\x30'
Latin uppercase letter I '\x49', latin lowercase letter L '\x6c' and numeral 1 '\x31'

and so on. A good generator would have an opt-in mode to remove such homoglyphs from the character set, especially in generic password mode. Some nuance is called-for here: in passphrase mode, the randomly-generated passphrase

illegitimate 54321 INCREDIBLY LIMINAL

is visually unambiguous, and so sometimes, homoglyphs are OK.

As a more complex solution, there are other methods to distinguish homoglyphs:

use a fixed-width font. We always get this for free in a terminal, and so ask_user_terminal() is somewhat safe, but tkui is not and needs modification to use a fixed-width or monospace font for label.
use other character styles like underlining or color to indicate case or numerals.

Secrets & Entropy

Types of Secrets

Typically users will interact with various types of secrets, for a password generator we mostly only care about two types.

Memorized Secrets: A secret the user is intended to memorize, such as: a passphrase, a password, or a PIN.
Look-Up Secrets: A secret the user is intended to store in a password manager.

The code is generating Look-Up Secrets.

Entropy

Lets say we use random.SystemRandom to generate 16 bits of entropy, getting the number 42401. We can then convert the value into multiple forms:

Binary	Octal	Decimal	Base 64
1010010110100001	122641	42401	KWh

Lets look at how we got "base 64" with the table "A-Za-z0-9+/".

Segment	Binary	Decimal	Character
1	100001	33	h
2	010110	22	W
3	1010	10	K

Since we know the secret and the algorithm to create the secret we can verify the entropy in bits. We picked from a table of 64 values three times, then convert to base 2 to determine the bit length:

$$ \log_2 64^3 = 3\log_2 64 = 3 \cdot 6 = 18 $$

Can you figure out where I mislead you and the 18 has come from?

Segment 3 is 1010 which is 4 bits long, not 6. 18 - 2 = 16.

Entropy can be hard to calculate because we don't always know the algorithm used to create the secret, and so can make wrong assumptions. However, from what we've seen already we can easily figure out two ways to maximise entropy:

Character Set: Using more characters lets you store more entropy per character.
Length: Even with 2 characters ("01") increasing the length increases the entropy.

As such Look-Up Secrets are the natural solution.

Memorized Secrets

The biggest downside to the natural solution is people are pretty bad at memorizing a string of purely random characters. So lets look at bad and good ways for humans to increase entropy.

Password

Which secret is easier to remember, a password "password" or a Look-Up Secret "xhwvoizl"? The password is much easier to remember, so lets compare the entropy. For simplicity lets pretend the dictionary is only 100,000 words.

$$ \begin{align} \log_2 100000 & = 16.61 \\ 3\log_2 26 & = 14.1 \\ 4\log_2 26 & = 18.8 \\ \end{align} $$

In the old days where you'd need a word to gain access to a back-ally establishment, 16.61 bits of entropy was pretty good. Now-a-days you can check 100,000 passwords in the matter of seconds.

Lets look at how people would typically implement the mandate a minimum of 1 number, 1 symbol, and 1 uppercase letter.

Unfortunately, many users will add complexity to their password by simply capitalizing the first letter of their password or adding a "1" or "!" to the end. And while it technically does make a password more difficult to crack, most password-crackers worth their salt know users tend to follow these patterns and can use them to reduce the time needed to decrypt a stolen password.
— NIST Password Guidelines and Best Practices for 2020

Ok so "password" becomes "Password1!" or "Password!1". Lets see the increase in bit-length when being generous.

If we randomly uppercase one of the letters in password we can nCk. Lets pretend every word in the dictionary is 8 large.

$$\log_2 \binom{8}{1} = 3$$
We can pick from 10 numbers.

$$\log_2 10 = 3.32$$
We can pick from 33 special characters.

$$\log_2 33 = 5.04$$
The number and special character can be in two positions.

$$\log_2 \binom{2}{1} = 1$$

Lets see how the recommendation affects the entropy when being generous.

$$ \begin{align} 16.61 + 3 + 3.32 + 5.04 + 1 & = 28.97 \\ \log_2 100,000 \binom{8}{1} \cdot 10 \cdot 33 \cdot \binom{2}{1} & = 28.98 \\ \end{align} $$

As we can see reasoning with what does and doesn't increase entropy can be simplified into adding different entropy together. The difference is just a rounding error.

Rather than being generous we can see the increase from the typical way people change the password:

$$ \begin{align} \log_2 1 & = 0 \text{ (for cap)}\\ \log_2 1 & = 0 \text{ (for 1)}\\ \log_2 1 & = 0 \text{ (for !)}\\ \log_2 \binom{2}{1} & = 1 \text{ (for 1! or !1)}\\ \end{align} $$

Ok, so 1 bit of extra entropy in the common case is really not a good algorithm. But more importantly shows 28.98 is just the best case of the algorithm; as some outputs, such as "Password1!", have no tangible benefit.

As such your password generator should estimate the entropy of the output to ensure the output is meeting a high enough bar. Even with a good algorithm you can produce a common password or pattern.

Passphrase

Passphrases in particular are lower-entropy per character than fully-random passwords, but are generally easier to remember, and so have several practical advantages when compared to passwords. They were made more popular by XKCD's illustration of the approach and its implementation in correcthorsebatterystaple.net, and are now supported in many free generators.

As we saw from the previous example the biggest increase in bits of entropy was the 16.61. We can very easily just 4x our entropy by generating four words. And is much easier to remember than a random assortment of lowercase letters.

$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 14 \log_2 26 & = 65.81 \\ 15 \log_2 26 & = 70.51 \\ \end{align} $$

As such the password generator should have a human mode, in which you generate an n word passphrase. Adding a little flair through 1337, or other encodings, is okish depending on the user but should not be the backbone of a Memorable Secret algorithm due to the low increase in entropy.

Look-Up Secrets

When building a password generator generating a bunch of random characters from a character set is normally the default. So how well does the algorithm hold up? Since Unicode has 159,801 characters, lets focus on ASCII with 2*26 Latin letters, 10 Arabic numerals and 33 ASCII symbols for 95 in total.

$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 10 \log_2 95 & = 65.7 \\ 11 \log_2 95 & = 72.27 \\ \end{align} $$

How does the advice of forcing at least one lower, upper, number and special affect things?

We have lower, \26ドル\$, being randomly inserted into \$\binom{1}{1}\$ slots.
We have upper, \26ドル\$, being randomly inserted into \$\binom{2}{1}\$ slots.
We have number, \10ドル\$, being randomly inserted into \$\binom{3}{1}\$ slots.
We have special, \33ドル\$, being randomly inserted into \$\binom{4}{1}\$ slots.

We can simplify the binomials into a single product \$\prod_{i=1}^{n} i\$ where \$n\$ is the amount of 'buckets'. Lets contrast to the secrets we are securing against.

$$ \begin{align} 4\log_2 26 + 26 + 10 + 33 & = 26.28 \\ \log_2 26 \cdot 26 \cdot 10 \cdot 33 \cdot \prod_{i=1}^{4} i & = 22.35 \\ 26.28 - 22.35 & = 3.93 \\ \\ 4\log_2 10 & = 13.29 \\ 4\log_2 26 & = 18.80 \\ 4\log_2 33 & = 20.18 \\ \end{align} $$

The question now is "what max entropy is too low 13, 19, 20?" As the reduction from 26 to 22 is only a result of pre-filtering what you think is too low of an entropy.

As such what is good for a Memorable Secret generator can be harmful to a Look-Up Secret generator. You should think about both algorithms as wholely independent. However, just like with Passwords you should show an estimate of the entropy; as the user is the one who decides what entropy is too little.

Random generation

State and Seeds

As professionals we should strive for defense in depth, and that means that we must assume that an attacker may eventually gain some partial or full knowledge of the time at which the password was generated. From the documentation for time(),

Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.

A password generator should not seed with a potentially coarse timestamp as the question code does. If it did and an attacker determined or guessed the time of generation, no amount of password length or character complexity will save you: the attacker can re-generate a password of 1,000 characters identically to the original program. This problem can be avoided by not seeding at all, and relying instead on the implementation-defined seeding behaviour as wrapped by SystemRandom. It should not and cannot be seeded at the Python level.

Probability

The question code assumes that the category selection probabilities should be evenly distributed, but that has a side-effect: if we only allow one symbol - say, an underscore - but have three categories, one third of the output characters will tend to be underscores. Is that sensible? A solution that's both simpler and (depending on your interpretation) produces a more expected probability distribution is to produce a superset of ALL_CHARS from the union of each character class set, and issue a single call to choices(ALL_CHARS, k=n).

Random Data in Python

As a progression,

replace random.randint(a=0, b=2) with random.randrange(3) so that we can take advantage of half-open intervals; but really,
get rid of the individual-character randint() calls and the loop, and replace with a single, non-looped call to my_system_random.choices(ALL_CHARS, k=n).

User interface

Usability

Don't use a Tk label. Instead, use an Entry so that the user can copy the text. Again from the NIST summary, it encourages use of the clipboard:

Allow Password "Paste-In"

If passwords are easier to enter, your users are more likely to use a longer, more complex password in the first place (which is more secure). That’s where "paste-in" password functionality is now advantageous — if entering passwords is as simple as copying and pasting them into a password field; it encourages safer behavior.

This is especially important considering how many passwords the average person has to remember these days and the tools people are using to manage them all.

The Entry should also be editable: in the end, the user (and not the program) is in the best position to understand whether the password complies with requirements of the target authentication entry system, so should be able to adjust accordingly.

Peek Safety

In ask_user_terminal()'s confirmation loop, the password is exposed to over-the-shoulder spying. In a crowded café or airport someone may see it. We can limit this exposure by

Initially showing the user the password,
Prompting the user to continue once they have safely recorded or memorized the password,
Clearing the screen, and then
Using getpass instead of input to collect the confirmation password.

getpass configures the console to disable echo to make such spying impossible. This approach is typical in the Linux/Unix world.

In the case of tkinter, create an Entry with show='*'. Make a Show button that, when depressed, calls entry.configure(show=''), and when released, calls entry.configure(show='*'). This is also peek-safe and is an easy and typical way to allow users to choose the exact amount of exposure time.

Loose Coupling

Rather than calling label.configure(text=), associate label (the user interface) with a StringVar (the data path). In the replace_password handler, only interact with the StringVar and not the label.

Performance

Performance is generally not a concern for applications of this kind. As a result, most attempts at making this application faster can be considered premature optimization. Most CPU-bound tasks in the application are so fast as to be imperceptible by the user. The only time they may become noticeable is if you intend to extend this application to generate millions of passwords (for a rainbow table, etc.).

SystemRandom is a powerful high-level abstraction on os.urandom which is itself an abstraction on the operating system's best source of secure random generation, often the Linux kernel's getrandom() and its fallback the urandom device, or Windows BCryptGenRandom. urandom is fast:

A read from the /dev/urandom device will not block waiting for more entropy.

The priorities for this program should be correctness, security, maintainability and usability, not performance.

Sets in Python

As a progression, and assuming the continued use of the existing program's character sets,

replace the LOWER_LETTER (etc.) lists [] with immutable tuples () since we never want those character sets to change; but really,
replace them with simple strings 'ABCDE...' which are themselves immutable sequences; but really,
just import those from the strings module; and then,
cast to a frozenset as in ASCII_LETTERS = frozenset(string.ascii_letters).

Concatenation in Python

There is a lot of discourse in the Python community about the best way to concatenate strings. For various reasons, in-place += is not always a favourite. In the context of this question, due to the API offered by secrets.SystemRandom().choices(), ''.join(choices()) is the best option. The fact that this will scale linearly O(n) in time with the length of the password is immaterial because password length is negligible from a performance perspective. Instead we prefer it because it is the simplest and clearest method, and does not need an explicit loop.

Even if you were to keep an explicit loop, you would want to replace while i < n with for _ in range(n) to loop like a native.

Question 4

@Reinderien Ok I rewrote the first half in a here's how to improve the code manner rather than "here's a bunch of links". I likely have removed some important information you put unintentionally; please just add back what you think is important. Since I started from scratch I added your stuff verbatim as otherwise following would be too hard in the diff. (I'd recommend using an external diff) Have I waffled on too much? Probably feel free to cut the cruft.

Question 5

The goal

I think most password generators are created as a learning exercise. They should probably stay a learning exercise. The idea that a user can easily create something that exceeds e.g. the open source KeePass without extensive study of the subject is exceedingly small.

If the password generator is provided as part of a service, then the provider of the service should probably consider who is responsible for having a good password in the first place. It's the password of the user for the system, the provider doesn't "own" the password. It's of course a good idea to guide the user to using a good password, but generating one for the user might be a step too far.

The entropy

The random number generator

First of all, the random number generator should not be defined on time. It should be a cryptographically secure pseudo-random number generator which is seeded by (or is) the system random number generator. In the following remarks on the code I'll assume a CSPRNG.

The size of the password

Passwords of 8 characters long are not recommended any more as they contain too little entropy. The default is therefore off. I'd rather have the user explicitly select a size, or have the UI define the default size instead, i.e. leave the default value out of the function declaration.

At the moment you'd expect 10 characters on the very low end and 12 characters on the higher end. PINs etc. of course will contain fewer characters, but those are combined with a maximum number of tries and probably other counter-measures.

Maximizing entropy

The randomness is not maximized. The choice is first made from the character sets, then the character is randomly chosen. However, some character sets are 26 characters, others only 10. It makes more sense to choose from the joined set of characters assuming that the current scheme is adhered to - i.e. the password should be fully random, as it currently isn't.

Per-character choice of character set

As the characters sets are chosen in advance it is very possible for the function to return a password with just digits. The chance of that happening is e.g. 1 in 4^4, i.e. 1 in 256 if the length is 4. This is definitely random so in a sense it is the most secure. However, it will be easily caught by a cracker that assumes a PIN to have been entered instead of a password.

It makes more sense to indicate a minimal number of characters from each set, and then shuffling the password afterwards. This is more likely to comply with certain password-policies that require you to use e.g. 2 digits and a symbol in your password.

The type of password

If the intent is to create easy-to-remember passwords then it makes more sense to choose from a set of words rather than a set of characters, as random characters are much harder to remember. As mentioned before, random characters are fine for storing passwords in a password safe like KeePass or an integrated password vault.

System interaction / the UI

Having the system generate a password in the first place

Generally the user already has a password generator tool as part of a password manager, either separate (KeePass), part of the system (Windows Credential Manager), connected (1Password) or as part of a browser.

These are vetted password managers that have been created by experts. They can often rely on system-provided methods for keeping passwords secure. Services should rely on these password managers rather than building their own.

Displaying the password in output / dialogue window

Currently no password-specific methods are used within the UI. This means that the password will be easy to retrieve from either the output or from the label by capturing the output or the screen in some way or other.

Quite often UIs have widgets specific for entering or displaying passwords. Those can of course be generated as well, e.g. by displaying passwords as pictures of characters only shown on hover or by pressing an "eye" button.

For the console version, one could consider storing the password in a given environment variable or even a file.

Connecting with a password vault

If at all possible it is better to store passwords directly in a password vault. Those generally already have password generator functions though. As such, this should be thought of more as an exercise than stand-alone code. That is true in general when it comes to password generation code or any cryptography-related code.

No feedback on the security of the password

Currently the user will never know how secure the password is after it has been generated. This kind of security should be indicated.

If the developer cannot indicate the amount of entropy that is expected to be in the password then the developer should probably see that as an indication that they require more education before writing or releasing a password generation function.

No iterations

Currently the alphabet (the characters that can be put into the password) is set, but as indicated, it may not comply with a password policy or it may not be pleasant for the user to enter. Generally password tools allow the user to choose whether to accept the password or generate a new one.

Of course this comes with a caveat: the user may generate passwords until they get a simple one to remember or type, but this is also one that may be easier to crack. I don't think there is much consensus of what to do here, though most password managers err towards letting the user decide whether to keep or not keep the password.

Hard to read passwords

Quite often password managers avoid letter / digit combinations that look too much like each other. The main culprit is probably 0, o and O but other combinations are troublesome as well, e.g. 1, i, I and l. Font choice can of course help with that.

This is mainly an issue if the user has to copy the password from screen themselves. This should be avoided in the first place.

The code

Undocumented contract

The code is using an undocumented scheme, which cannot be altered from the outside. As such the scheme is likely not compliant with many password policies. See the last remark on the entropy how to possibly fix this.

The contract of the function is not well documented. There is a guard statement that the length of the password should be between 4 and 16 characters, but this is not indicated in the documentation at all.

The output is not well documented either. The character classes are defined as constants, but they are not mentioned in the documentation. It doesn't document the entropy either, nor does it define the algorithm used.

Using a `str` for a password

The disadvantage of using strings is that they cannot be easily erased. It makes more sense to store the passwords in e.g. a bytearray that can be easily erased, right after the password is not required any more. If the password is generated for internal verification then it should be used as input for a Password Based Key Derivation Function (aka password hash) and erased right after the hash has been generated.

It is questionable whether a password generator should be written in Python in the first place, as it doesn't offer fine-grained memory or string control. Some system languages do offer that and may use the system to protect memory fields, e.g. by disallowing swapping specific memory regions to disk or even by encrypting memory regions.

The while loop

The while loop is a for loop in disguise. As it will always go through length iterations it doesn't make sense to implement it as such.

Peilonrayz – Peilonrayz ♦ · Answer 1 · 2024-07-10 20:43:17Z

Character Sets

The code assumes you only want passwords using the Latin alphabet, the Arabic numerals, and some ASCII special characters. Being locked into the character set has three main problems:

Expanded Characters

With the near ubiquitous presence of Unicode, being limited to 72 characters of Unicode's 159,801 is needlessly limiting. For instance, by expanding to just CP-1252 we can, humorously, 1337 "Password" to Páßwörð".

The increase in the character set means a password cracker will have to perform more work to crack the password. As such you should allow the user to provide additional characters. There are also security consequences: some scripts such as Chinese have morphemes with much more entropy than the letters of the Latin alphabet.

Different Locale

With Unicode we can get a little silly with say "🙈🙉🙊👻🤡👺" as a password. Which should raise some eye brows, "what you can have an emoji password?!" Where the answer entirely depends on the authenticator the user will be interacting with. For instance a PIN is limited only to numbers.

Additionally the user of your application may be more familiar with different script(s) and character input methods. For instance "password" vs "пароль" vs "パスワード"; would you be able to remember and enter all three?

Defaulting to the user's expectation determined from the locale is a good idea. Where all of ASCII's printable characters is a good fallback/option.

Disambiguation

Any time that a generated password needs to be read by human eyes, there is a risk of homoglyphs. These are visually similar characters that the user may confuse with each another, even if the computer knows better. Common examples are:

Latin uppercase letter O '\x4f' and numeral 0 '\x30'
Latin uppercase letter I '\x49', latin lowercase letter L '\x6c' and numeral 1 '\x31'

and so on. A good generator would have an opt-in mode to remove such homoglyphs from the character set, especially in generic password mode. Some nuance is called-for here: in passphrase mode, the randomly-generated passphrase

illegitimate 54321 INCREDIBLY LIMINAL

is visually unambiguous, and so sometimes, homoglyphs are OK.

As a more complex solution, there are other methods to distinguish homoglyphs:

use a fixed-width font. We always get this for free in a terminal, and so ask_user_terminal() is somewhat safe, but tkui is not and needs modification to use a fixed-width or monospace font for label.
use other character styles like underlining or color to indicate case or numerals.

Secrets & Entropy

Types of Secrets

Typically users will interact with various types of secrets, for a password generator we mostly only care about two types.

Memorized Secrets: A secret the user is intended to memorize, such as: a passphrase, a password, or a PIN.
Look-Up Secrets: A secret the user is intended to store in a password manager.

The code is generating Look-Up Secrets.

Entropy

Lets say we use random.SystemRandom to generate 16 bits of entropy, getting the number 42401. We can then convert the value into multiple forms:

Binary	Octal	Decimal	Base 64
1010010110100001	122641	42401	KWh

Lets look at how we got "base 64" with the table "A-Za-z0-9+/".

Segment	Binary	Decimal	Character
1	100001	33	h
2	010110	22	W
3	1010	10	K

Since we know the secret and the algorithm to create the secret we can verify the entropy in bits. We picked from a table of 64 values three times, then convert to base 2 to determine the bit length:

$$ \log_2 64^3 = 3\log_2 64 = 3 \cdot 6 = 18 $$

Can you figure out where I mislead you and the 18 has come from?

Segment 3 is 1010 which is 4 bits long, not 6. 18 - 2 = 16.

Entropy can be hard to calculate because we don't always know the algorithm used to create the secret, and so can make wrong assumptions. However, from what we've seen already we can easily figure out two ways to maximise entropy:

Character Set: Using more characters lets you store more entropy per character.
Length: Even with 2 characters ("01") increasing the length increases the entropy.

As such Look-Up Secrets are the natural solution.

Memorized Secrets

The biggest downside to the natural solution is people are pretty bad at memorizing a string of purely random characters. So lets look at bad and good ways for humans to increase entropy.

Password

Which secret is easier to remember, a password "password" or a Look-Up Secret "xhwvoizl"? The password is much easier to remember, so lets compare the entropy. For simplicity lets pretend the dictionary is only 100,000 words.

$$ \begin{align} \log_2 100000 & = 16.61 \\ 3\log_2 26 & = 14.1 \\ 4\log_2 26 & = 18.8 \\ \end{align} $$

In the old days where you'd need a word to gain access to a back-ally establishment, 16.61 bits of entropy was pretty good. Now-a-days you can check 100,000 passwords in the matter of seconds.

Lets look at how people would typically implement the mandate a minimum of 1 number, 1 symbol, and 1 uppercase letter.

Unfortunately, many users will add complexity to their password by simply capitalizing the first letter of their password or adding a "1" or "!" to the end. And while it technically does make a password more difficult to crack, most password-crackers worth their salt know users tend to follow these patterns and can use them to reduce the time needed to decrypt a stolen password.
— NIST Password Guidelines and Best Practices for 2020

Ok so "password" becomes "Password1!" or "Password!1". Lets see the increase in bit-length when being generous.

If we randomly uppercase one of the letters in password we can nCk. Lets pretend every word in the dictionary is 8 large.

$$\log_2 \binom{8}{1} = 3$$
We can pick from 10 numbers.

$$\log_2 10 = 3.32$$
We can pick from 33 special characters.

$$\log_2 33 = 5.04$$
The number and special character can be in two positions.

$$\log_2 \binom{2}{1} = 1$$

Lets see how the recommendation affects the entropy when being generous.

$$ \begin{align} 16.61 + 3 + 3.32 + 5.04 + 1 & = 28.97 \\ \log_2 100,000 \binom{8}{1} \cdot 10 \cdot 33 \cdot \binom{2}{1} & = 28.98 \\ \end{align} $$

As we can see reasoning with what does and doesn't increase entropy can be simplified into adding different entropy together. The difference is just a rounding error.

Rather than being generous we can see the increase from the typical way people change the password:

$$ \begin{align} \log_2 1 & = 0 \text{ (for cap)}\\ \log_2 1 & = 0 \text{ (for 1)}\\ \log_2 1 & = 0 \text{ (for !)}\\ \log_2 \binom{2}{1} & = 1 \text{ (for 1! or !1)}\\ \end{align} $$

Ok, so 1 bit of extra entropy in the common case is really not a good algorithm. But more importantly shows 28.98 is just the best case of the algorithm; as some outputs, such as "Password1!", have no tangible benefit.

As such your password generator should estimate the entropy of the output to ensure the output is meeting a high enough bar. Even with a good algorithm you can produce a common password or pattern.

Passphrase

Passphrases in particular are lower-entropy per character than fully-random passwords, but are generally easier to remember, and so have several practical advantages when compared to passwords. They were made more popular by XKCD's illustration of the approach and its implementation in correcthorsebatterystaple.net, and are now supported in many free generators.

As we saw from the previous example the biggest increase in bits of entropy was the 16.61. We can very easily just 4x our entropy by generating four words. And is much easier to remember than a random assortment of lowercase letters.

$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 14 \log_2 26 & = 65.81 \\ 15 \log_2 26 & = 70.51 \\ \end{align} $$

As such the password generator should have a human mode, in which you generate an n word passphrase. Adding a little flair through 1337, or other encodings, is okish depending on the user but should not be the backbone of a Memorable Secret algorithm due to the low increase in entropy.

Look-Up Secrets

When building a password generator generating a bunch of random characters from a character set is normally the default. So how well does the algorithm hold up? Since Unicode has 159,801 characters, lets focus on ASCII with 2*26 Latin letters, 10 Arabic numerals and 33 ASCII symbols for 95 in total.

$$ \begin{align} 4 \log_2 100000 & = 66.44 \\ 10 \log_2 95 & = 65.7 \\ 11 \log_2 95 & = 72.27 \\ \end{align} $$

How does the advice of forcing at least one lower, upper, number and special affect things?

We have lower, \26ドル\$, being randomly inserted into \$\binom{1}{1}\$ slots.
We have upper, \26ドル\$, being randomly inserted into \$\binom{2}{1}\$ slots.
We have number, \10ドル\$, being randomly inserted into \$\binom{3}{1}\$ slots.
We have special, \33ドル\$, being randomly inserted into \$\binom{4}{1}\$ slots.

We can simplify the binomials into a single product \$\prod_{i=1}^{n} i\$ where \$n\$ is the amount of 'buckets'. Lets contrast to the secrets we are securing against.

$$ \begin{align} 4\log_2 26 + 26 + 10 + 33 & = 26.28 \\ \log_2 26 \cdot 26 \cdot 10 \cdot 33 \cdot \prod_{i=1}^{4} i & = 22.35 \\ 26.28 - 22.35 & = 3.93 \\ \\ 4\log_2 10 & = 13.29 \\ 4\log_2 26 & = 18.80 \\ 4\log_2 33 & = 20.18 \\ \end{align} $$

The question now is "what max entropy is too low 13, 19, 20?" As the reduction from 26 to 22 is only a result of pre-filtering what you think is too low of an entropy.

As such what is good for a Memorable Secret generator can be harmful to a Look-Up Secret generator. You should think about both algorithms as wholely independent. However, just like with Passwords you should show an estimate of the entropy; as the user is the one who decides what entropy is too little.

Random generation

State and Seeds

As professionals we should strive for defense in depth, and that means that we must assume that an attacker may eventually gain some partial or full knowledge of the time at which the password was generated. From the documentation for time(),

Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.

A password generator should not seed with a potentially coarse timestamp as the question code does. If it did and an attacker determined or guessed the time of generation, no amount of password length or character complexity will save you: the attacker can re-generate a password of 1,000 characters identically to the original program. This problem can be avoided by not seeding at all, and relying instead on the implementation-defined seeding behaviour as wrapped by SystemRandom. It should not and cannot be seeded at the Python level.

Probability

The question code assumes that the category selection probabilities should be evenly distributed, but that has a side-effect: if we only allow one symbol - say, an underscore - but have three categories, one third of the output characters will tend to be underscores. Is that sensible? A solution that's both simpler and (depending on your interpretation) produces a more expected probability distribution is to produce a superset of ALL_CHARS from the union of each character class set, and issue a single call to choices(ALL_CHARS, k=n).

Random Data in Python

As a progression,

replace random.randint(a=0, b=2) with random.randrange(3) so that we can take advantage of half-open intervals; but really,
get rid of the individual-character randint() calls and the loop, and replace with a single, non-looped call to my_system_random.choices(ALL_CHARS, k=n).

User interface

Usability

Don't use a Tk label. Instead, use an Entry so that the user can copy the text. Again from the NIST summary, it encourages use of the clipboard:

Allow Password "Paste-In"

If passwords are easier to enter, your users are more likely to use a longer, more complex password in the first place (which is more secure). That’s where "paste-in" password functionality is now advantageous — if entering passwords is as simple as copying and pasting them into a password field; it encourages safer behavior.

This is especially important considering how many passwords the average person has to remember these days and the tools people are using to manage them all.

The Entry should also be editable: in the end, the user (and not the program) is in the best position to understand whether the password complies with requirements of the target authentication entry system, so should be able to adjust accordingly.

Peek Safety

In ask_user_terminal()'s confirmation loop, the password is exposed to over-the-shoulder spying. In a crowded café or airport someone may see it. We can limit this exposure by

Initially showing the user the password,
Prompting the user to continue once they have safely recorded or memorized the password,
Clearing the screen, and then
Using getpass instead of input to collect the confirmation password.

getpass configures the console to disable echo to make such spying impossible. This approach is typical in the Linux/Unix world.

In the case of tkinter, create an Entry with show='*'. Make a Show button that, when depressed, calls entry.configure(show=''), and when released, calls entry.configure(show='*'). This is also peek-safe and is an easy and typical way to allow users to choose the exact amount of exposure time.

Loose Coupling

Rather than calling label.configure(text=), associate label (the user interface) with a StringVar (the data path). In the replace_password handler, only interact with the StringVar and not the label.

Performance

Performance is generally not a concern for applications of this kind. As a result, most attempts at making this application faster can be considered premature optimization. Most CPU-bound tasks in the application are so fast as to be imperceptible by the user. The only time they may become noticeable is if you intend to extend this application to generate millions of passwords (for a rainbow table, etc.).

SystemRandom is a powerful high-level abstraction on os.urandom which is itself an abstraction on the operating system's best source of secure random generation, often the Linux kernel's getrandom() and its fallback the urandom device, or Windows BCryptGenRandom. urandom is fast:

A read from the /dev/urandom device will not block waiting for more entropy.

The priorities for this program should be correctness, security, maintainability and usability, not performance.

Sets in Python

As a progression, and assuming the continued use of the existing program's character sets,

replace the LOWER_LETTER (etc.) lists [] with immutable tuples () since we never want those character sets to change; but really,
replace them with simple strings 'ABCDE...' which are themselves immutable sequences; but really,
just import those from the strings module; and then,
cast to a frozenset as in ASCII_LETTERS = frozenset(string.ascii_letters).

Concatenation in Python

There is a lot of discourse in the Python community about the best way to concatenate strings. For various reasons, in-place += is not always a favourite. In the context of this question, due to the API offered by secrets.SystemRandom().choices(), ''.join(choices()) is the best option. The fact that this will scale linearly O(n) in time with the length of the password is immaterial because password length is negligible from a performance perspective. Instead we prefer it because it is the simplest and clearest method, and does not need an explicit loop.

Even if you were to keep an explicit loop, you would want to replace while i < n with for _ in range(n) to loop like a native.

@Reinderien Ok I rewrote the first half in a here's how to improve the code manner rather than "here's a bunch of links". I likely have removed some important information you put unintentionally; please just add back what you think is important. Since I started from scratch I added your stuff verbatim as otherwise following would be too hard in the diff. (I'd recommend using an external diff) Have I waffled on too much? Probably feel free to cut the cruft.

score 3 · Answer 2 · 2025-09-20 11:18:36Z

The goal

I think most password generators are created as a learning exercise. They should probably stay a learning exercise. The idea that a user can easily create something that exceeds e.g. the open source KeePass without extensive study of the subject is exceedingly small.

If the password generator is provided as part of a service, then the provider of the service should probably consider who is responsible for having a good password in the first place. It's the password of the user for the system, the provider doesn't "own" the password. It's of course a good idea to guide the user to using a good password, but generating one for the user might be a step too far.

The entropy

The random number generator

First of all, the random number generator should not be defined on time. It should be a cryptographically secure pseudo-random number generator which is seeded by (or is) the system random number generator. In the following remarks on the code I'll assume a CSPRNG.

The size of the password

Passwords of 8 characters long are not recommended any more as they contain too little entropy. The default is therefore off. I'd rather have the user explicitly select a size, or have the UI define the default size instead, i.e. leave the default value out of the function declaration.

At the moment you'd expect 10 characters on the very low end and 12 characters on the higher end. PINs etc. of course will contain fewer characters, but those are combined with a maximum number of tries and probably other counter-measures.

Maximizing entropy

The randomness is not maximized. The choice is first made from the character sets, then the character is randomly chosen. However, some character sets are 26 characters, others only 10. It makes more sense to choose from the joined set of characters assuming that the current scheme is adhered to - i.e. the password should be fully random, as it currently isn't.

Per-character choice of character set

As the characters sets are chosen in advance it is very possible for the function to return a password with just digits. The chance of that happening is e.g. 1 in 4^4, i.e. 1 in 256 if the length is 4. This is definitely random so in a sense it is the most secure. However, it will be easily caught by a cracker that assumes a PIN to have been entered instead of a password.

It makes more sense to indicate a minimal number of characters from each set, and then shuffling the password afterwards. This is more likely to comply with certain password-policies that require you to use e.g. 2 digits and a symbol in your password.

The type of password

If the intent is to create easy-to-remember passwords then it makes more sense to choose from a set of words rather than a set of characters, as random characters are much harder to remember. As mentioned before, random characters are fine for storing passwords in a password safe like KeePass or an integrated password vault.

System interaction / the UI

Having the system generate a password in the first place

Generally the user already has a password generator tool as part of a password manager, either separate (KeePass), part of the system (Windows Credential Manager), connected (1Password) or as part of a browser.

These are vetted password managers that have been created by experts. They can often rely on system-provided methods for keeping passwords secure. Services should rely on these password managers rather than building their own.

Displaying the password in output / dialogue window

Currently no password-specific methods are used within the UI. This means that the password will be easy to retrieve from either the output or from the label by capturing the output or the screen in some way or other.

Quite often UIs have widgets specific for entering or displaying passwords. Those can of course be generated as well, e.g. by displaying passwords as pictures of characters only shown on hover or by pressing an "eye" button.

For the console version, one could consider storing the password in a given environment variable or even a file.

Connecting with a password vault

If at all possible it is better to store passwords directly in a password vault. Those generally already have password generator functions though. As such, this should be thought of more as an exercise than stand-alone code. That is true in general when it comes to password generation code or any cryptography-related code.

No feedback on the security of the password

Currently the user will never know how secure the password is after it has been generated. This kind of security should be indicated.

If the developer cannot indicate the amount of entropy that is expected to be in the password then the developer should probably see that as an indication that they require more education before writing or releasing a password generation function.

No iterations

Currently the alphabet (the characters that can be put into the password) is set, but as indicated, it may not comply with a password policy or it may not be pleasant for the user to enter. Generally password tools allow the user to choose whether to accept the password or generate a new one.

Of course this comes with a caveat: the user may generate passwords until they get a simple one to remember or type, but this is also one that may be easier to crack. I don't think there is much consensus of what to do here, though most password managers err towards letting the user decide whether to keep or not keep the password.

Hard to read passwords

Quite often password managers avoid letter / digit combinations that look too much like each other. The main culprit is probably 0, o and O but other combinations are troublesome as well, e.g. 1, i, I and l. Font choice can of course help with that.

This is mainly an issue if the user has to copy the password from screen themselves. This should be avoided in the first place.

The code

Undocumented contract

The code is using an undocumented scheme, which cannot be altered from the outside. As such the scheme is likely not compliant with many password policies. See the last remark on the entropy how to possibly fix this.

The contract of the function is not well documented. There is a guard statement that the length of the password should be between 4 and 16 characters, but this is not indicated in the documentation at all.

The output is not well documented either. The character classes are defined as constants, but they are not mentioned in the documentation. It doesn't document the entropy either, nor does it define the algorithm used.

Using a `str` for a password

The disadvantage of using strings is that they cannot be easily erased. It makes more sense to store the passwords in e.g. a bytearray that can be easily erased, right after the password is not required any more. If the password is generated for internal verification then it should be used as input for a Password Based Key Derivation Function (aka password hash) and erased right after the hash has been generated.

It is questionable whether a password generator should be written in Python in the first place, as it doesn't offer fine-grained memory or string control. Some system languages do offer that and may use the system to protect memory fields, e.g. by disallowing swapping specific memory regions to disk or even by encrypting memory regions.

The while loop

The while loop is a for loop in disguise. As it will always go through length iterations it doesn't make sense to implement it as such.

Typical password generator in Python

The code

2 Answers 2

Character Sets

Expanded Characters

Different Locale

Disambiguation

Secrets & Entropy

Types of Secrets

Entropy

Memorized Secrets

Password

Passphrase

Look-Up Secrets

Random generation

State and Seeds

Probability

Random Data in Python

User interface

Usability

Peek Safety

Loose Coupling

Performance

Sets in Python

Concatenation in Python

The goal

The entropy

The random number generator

The size of the password

Maximizing entropy

Per-character choice of character set

The type of password

System interaction / the UI

Having the system generate a password in the first place

Displaying the password in output / dialogue window

Connecting with a password vault

No feedback on the security of the password

No iterations

Hard to read passwords

The code

Undocumented contract

Using a str for a password

The while loop

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

Using a `str` for a password