Password generators are extremely popular on CodeReview with beginners to both coding in general and Python in particular, for a number of reasons:
- They're (seemingly) easy to implement
- They offer an excuse to learn a little bit of the Python built-in API
- They are ostensibly useful in day-to-day computing
The code in this question is written as a straw man to capture (most of) the (significant) issues commonly exhibited by beginner password generator code.
The code
import random
import time
import tkinter
LOWER_LETTERS = [
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
]
UPPER_LETTERS = [
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
]
NUMBERS = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
SYMBOLS = ['!', '@', '#', '$', '%', '^', '&', '*', '(', ')']
def generate_password(length: int = 8) -> str:
"""
Generate a password string of length length, with characters pulled randomly
from the four category lists above.
"""
if not (4 <= length <= 16):
raise ValueError(f'Password length of {length} not in [4, 16]')
random.seed(time.time())
password = ''
i = 0
while i < length:
category_index = random.randint(a=0, b=3)
category = [LOWER_LETTERS, UPPER_LETTERS, NUMBERS, SYMBOLS][category_index]
char_index = random.randint(a=0, b=len(category)-1)
char = category[char_index]
password += char
i += 1
return password
def ask_user_terminal() -> None:
"""
Generate a new password via the terminal.
"""
length = int(input('Enter password length: '))
password = generate_password(length)
print('New password:', password)
while True:
confirm = input('Enter password to confirm: ')
if confirm == password:
break
def tkui() -> None:
"""
Generate a new password via a GUI.
"""
def replace_password() -> None:
label.configure(text=generate_password(length=8))
window = tkinter.Tk()
window.title('Password generator')
window.grid()
label = tkinter.Label()
label.grid(row=0)
tkinter.Button(
window, text='Generate', command=replace_password,
).grid(row=1)
window.mainloop()
if __name__ == '__main__':
ask_user_terminal()
tkui()
1 Answer 1
Do we really want a password?
The question code assumes generation of a generic password, but the underlying intent is to generate an authentication factor. Authentication factors are commonly passwords, but do not always need to be passwords. A Python program could just as easily generate a PIN or a passphrase.
Passphrases in particular are lower-entropy per character than fully-random passwords, but are generally easier to remember, and so have several practical advantages when compared to passwords. They were made more popular by XKCD's illustration of the approach and its implementation in correcthorsebatterystaple.net, and are now supported in many free generators.
A rule of thumb is: if the factor is guaranteed to go directly into a password management program with no intent for manual entry, use a fully-random password; if the factor needs to be memorized and entered by a human, use a long, random passphrase.
Character sets
Similar to the choice of password-or-passphrase, the character pool needs to be carefully chosen with several things in mind:
Cultural reach and internationalization
The question code assumes a Latin alphabet. The Latin alphabet is popular but not ubiquitous. Unicode in particular is designed to describe human writing systems in a comprehensive and systematic manner and is well-supported in most modern computing systems.
An obvious consequence is that the Latin upper and lowercase alphabets are needlessly narrow. Whereas they capture all letters in ASCII, they don't even capture all letters in the commonly-available CP-1252 - for instance, the French accents are not covered. Never mind the 161 different human scripts (alphabets, abugidas and syllabaries) in Unicode collectively containing many thousands of code points. A user whose native language uses a non-Latin script may find it easier to memorize a password generated from the character set of their native script. There are also security consequences: some scripts such as Chinese have morphemes with much more entropy than the letters of the Latin alphabet.
Target system
The question code assumes a character set that fits within ASCII representation. Authentication entry systems are highly diverse, and where most of them should accept Unicode passwords, many do not. ASCII is a de-facto lowest common denominator across computing systems and so this practice is common. A well-designed password generator should allow the user to specify the accepted character set.
Many authentication entry systems impose upper bounds, rejecting non-ASCII characters, or even
certain non-alphanumeric symbols within ASCII. The most aggressive constraints may only allow
an underscore _
and no other symbols. Rejecting symbols in a password field is poor practice and may betray a system
improperly guarded against
injection attacks; a good authentication entry system
should be able to accept <<<<!--
and Robert'); DROP TABLE Students;--
as passwords with no risk to its function. However, users
usually have no control over the design of such systems and have to generate passwords to suit.
A good password generator should by default include all ASCII alphanumeric and non-alphanumeric printable characters, with the option to broaden or narrow this as needed.
S3cur1ty through character complexity
Many authentication entry systems also apply lower bounds to the accepted character set, such as minimum counts for selected character classes like numbers and "symbols". "Symbols" vary in definition from system to system, but are often considered any non-alphanumeric, non-whitespace printable ASCII character.
This is done in an attempt to increase password entropy, and in a perfect world where the entire character encoding is used, that may be justified. However, authorities including NIST now consider this practice to be poorly-informed and counterproductive:
However, additional research shows that requiring new passwords to include a certain amount of complexity can actually make them less secure. And that’s why NIST has also removed all password-complexity requirements from their guidelines.
For example, many companies require that users include special characters, like a number, symbol, or uppercase letter, in their passwords to make them harder to decrypt.
Unfortunately, many users will add complexity to their password by simply capitalizing the first letter of their password or adding a "1" or "!" to the end. And while it technically does make a password more difficult to crack, most password-crackers worth their salt know users tend to follow these patterns and can use them to reduce the time needed to decrypt a stolen password.
Disambiguation
Any time that a generated password needs to be read by human eyes, there is a risk of homoglyphs. These are visually similar characters that the user may confuse with each another, even if the computer knows better. Common examples are:
- Latin uppercase letter O
'\x4f'
and numeral 0'\x30'
- Latin uppercase letter I
'\x49'
, latin lowercase letter L'\x6c'
and numeral 1'\x31'
and so on. A good generator would have an opt-in mode to remove such homoglyphs from the character set, especially in generic password mode. Some nuance is called-for here: in passphrase mode, the randomly-generated passphrase
illegitimate 54321 INCREDIBLY LIMINAL
is visually unambiguous, and so sometimes, homoglyphs are OK.
As a more complex solution, there are other methods to distinguish homoglyphs:
- use a fixed-width font. We always get this for free in a terminal, and so
ask_user_terminal()
is somewhat safe, buttkui
is not and needs modification to use a fixed-width or monospace font forlabel
. - use other character styles like underlining or color to indicate case or numerals.
Sets in Python
As a progression, and assuming the continued use of the existing program's character sets,
- replace the
LOWER_LETTER
(etc.) lists[]
with immutable tuples()
since we never want those character sets to change; but really, - replace them with simple strings
'ABCDE...'
which are themselves immutable sequences; but really, - just
import
those from the strings module; and then, - cast to a frozenset
as in
ASCII_LETTERS = frozenset(string.ascii_letters)
.
Length
Most authentication entry systems impose a minimum overall character count. Some even impose a maximum character count, but (other than something within reason, say 128 characters), this is very bad practice.
Again from the summary of the NIST rules,
password length benefits you more than complexity on a technical level [...] The NIST guidelines call for a strict eight-character minimum length.
As such, both the minimum and maximum limits in the question code are inappropriate. A good password generator should enforce an absolute minimum of eight characters, allow the user to increase that minimum to suit the target system, have a default length somewhat higher than eight characters, and have a much higher upper limit.
Concatenation in Python
There is a lot of
discourse in the Python community
about the best way to concatenate strings. For various reasons, in-place +=
is not always a favourite.
In the context of this question, due to the API offered by
secrets.SystemRandom().choices()
,
''.join(choices())
is the best option. The fact that this will scale linearly O(n) in time with the length of the
password is immaterial because password length is negligible from a performance perspective. Instead we prefer it
because it is the simplest and clearest method, and does not need an explicit loop.
Even if you were to keep an explicit loop, you would want to replace while i < n
with for _ in range(n)
to loop like a native.
Random generation
PRNG source
The notion that password generation is a non-cryptographic activity is a non-starter.
Most passwords used in computing feed into cryptographic systems like
key derivation functions or
hashes and should
expect some effort in promoting full-stack security. This does not have to be complicated.
The first and most important step is to follow the so-called "admonition" in the
random
module documentation:
Warning: The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the
secrets
module.
The cost of making this change is essentially nothing, and the risk to continued use of random
is low but non-zero. We should instead rely on secrets
which
is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.
In particular,
secrets
should be used in preference to the default pseudo-random number generator in therandom
module, which is designed for modelling and simulation, not security or cryptography.
Even the planning document PEP-0506 Adding A Secrets Module To The Standard Library explains that
Python’s standard library makes it too easy for developers to inadvertently make serious security errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum and expressed some concern [1] about the use of MT for generating sensitive information such as passwords, secure tokens, session keys and similar.
Make a single instance of secrets.SystemRandom()
and share it across calls to your
generator function.
State and seeds
As professionals we should strive for
defense in depth, and that means that
we must assume that an attacker may eventually gain some partial or full knowledge of the time at which the
password was generated. From the documentation for
time()
,
Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second.
A password generator should not seed with a potentially coarse timestamp as the question code does.
If it did and an attacker determined or guessed the time of generation, no amount of password length
or character complexity will save you: the attacker can re-generate a password of 1,000 characters
identically to the original program. This problem can be avoided by not seeding at all, and relying instead on the implementation-defined seeding behaviour as wrapped by SystemRandom
. It should not and cannot be seeded at the Python level.
Probability
The question code assumes that the category selection probabilities should be
evenly distributed, but that has a side-effect: if we only allow one symbol - say,
an underscore - but have three categories, one third of the output characters will tend to be
underscores. Is that sensible? A solution that's both simpler and (depending on your
interpretation) produces a more expected probability distribution is to produce a superset
of ALL_CHARS
from the union of each character class set, and issue a single call to
choices(ALL_CHARS, k=n)
.
Random data in Python
As a progression,
- replace
random.randint(a=0, b=2)
withrandom.randrange(3)
so that we can take advantage of half-open intervals; but really, - get rid of the individual-character
randint()
calls and the loop, and replace with a single, non-looped call tomy_system_random.choices(ALL_CHARS, k=n)
.
User interface
Usability
Don't use a Tk label. Instead, use an Entry
so that the user can copy the text. Again from the NIST summary, it encourages use of the clipboard:
Allow Password "Paste-In"
If passwords are easier to enter, your users are more likely to use a longer, more complex password in the first place (which is more secure). That’s where "paste-in" password functionality is now advantageous — if entering passwords is as simple as copying and pasting them into a password field; it encourages safer behavior.
This is especially important considering how many passwords the average person has to remember these days and the tools people are using to manage them all.
The Entry
should also be editable: in the end, the user (and not the program) is in the best position to understand whether the password complies with requirements of the target authentication entry system, so should be able to adjust accordingly.
Peek safety
In ask_user_terminal()
's confirmation loop, the password is exposed to over-the-shoulder spying. In a crowded
café or airport someone may see it. We can limit this exposure by
- Initially showing the user the password,
- Prompting the user to continue once they have safely recorded or memorized the password,
- Clearing the screen, and then
- Using
getpass
instead ofinput
to collect the confirmation password.
getpass
configures the console to disable echo to
make such spying impossible. This approach is typical in the Linux/Unix world.
In the case of tkinter, create an Entry
with show='*'
. Make a Show button that, when depressed,
calls entry.configure(show='')
, and when released, calls entry.configure(show='*')
. This is also
peek-safe and is an easy and typical way to allow users to choose the
exact amount of exposure time.
Loose coupling
Rather than calling label.configure(text=)
, associate label
(the user interface) with a
StringVar
(the data path). In the replace_password
handler, only interact with the StringVar
and not the label.
Performance
Performance is generally not a concern for applications of this kind. As a result, most attempts at making this application faster can be considered premature optimization. Most CPU-bound tasks in the application are so fast as to be imperceptible by the user. The only time they may become noticeable is if you intend to extend this application to generate millions of passwords (for a rainbow table, etc.).
SystemRandom
is a powerful high-level abstraction on os.urandom
which is itself an abstraction on the operating system's best source of secure random generation, often the Linux kernel's getrandom()
and its fallback the urandom
device, or Windows BCryptGenRandom. urandom
is fast:
A read from the
/dev/urandom
device will not block waiting for more entropy.
The priorities for this program should be correctness, security, maintainability and usability, not performance.
-
1\$\begingroup\$ I know it's not what you meant, but if a system were to "only allow an underscore" as password, I'd no longer call that a password! ;-) \$\endgroup\$Toby Speight– Toby Speight2024年08月01日 14:03:53 +00:00Commented Aug 1, 2024 at 14:03
-
\$\begingroup\$ @TobySpeight sure; how about that? \$\endgroup\$Reinderien– Reinderien2024年08月03日 13:08:49 +00:00Commented Aug 3, 2024 at 13:08
-
\$\begingroup\$ -1 The answer misrepresents the argument put forth by NIST in "S3cur1ty through character complexity". A password generator, unlike a human, will not "add complexity to their password by simply capitalizing the first letter of their password or adding a "1" or "!" to the end". \$\endgroup\$2024年11月12日 18:24:00 +00:00Commented Nov 12, 2024 at 18:24
Explore related questions
See similar questions with these tags.
if not (4 <= n <= 16)
" Even knowing it's a strawman, that upper bound is painful to read. \$\endgroup\$