In multiple parts of my program, user input needs to be validated and categorized. As these "validation trees" can get pretty unwieldy, I've started splitting them out in different functions, for example like this:
from enum import Enum, auto
class InputCat(Enum):
INVALID = auto()
HELP = auto()
SELECT = auto()
GOTO_PAGE = auto()
GOTO_PLAYER = auto()
def _goto_player(user_input: str) -> InputCat:
"""Checks if string is of type GOTO_PLAYER."""
for char in user_input[1:]:
if not (char.isalpha() or char in {' ', '-'}):
print("Invalid input: if character after '>' is alpha, all other characters must be alpha, ' ' or '-'.")
return InputCat.INVALID
return InputCat.GOTO_PLAYER
def _goto_page(user_input: str) -> InputCat:
"""Checks if string is of type GOTO_PAGE."""
for char in user_input[1:]:
if not char.isnumeric():
print("Invalid input: if character after '>' is numeric, all other characters must be numeric too.")
return InputCat.INVALID
return InputCat.GOTO_PAGE
def _goto(user_input: str) -> InputCat:
"""Checks if string is of type GOTO_PAGE or GOTO_PLAYER."""
if len(user_input) == 1:
print("Invalid input: need more input after '>'.")
return InputCat.INVALID
if user_input[1].isnumeric():
return _goto_page(user_input)
elif user_input[1].isalpha():
return _goto_player(user_input)
else:
print("Invalid input: character after '>' must be alphanumeric.")
return InputCat.INVALID
def _select(user_input: str) -> InputCat:
"""Checks if string is of type SELECT."""
for char in user_input:
if not char.isnumeric():
print("Invalid input: if first character is numeric, all other characters must be numeric too.")
return InputCat.INVALID
return InputCat.SELECT
def _help(user_input: str) -> InputCat:
"""Checks if string is of type HELP."""
if len(user_input) > 1:
print("Invalid input: when using '?', no other characters are allowed.")
return InputCat.INVALID
return InputCat.HELP
def get_category(user_input: str) -> InputCat:
"""Checks if string is of type HELP, SELECT, GOTO_PAGE, GOTO_PLAYER or INVALID."""
if not user_input:
print('Invalid input: no input.')
return InputCat.INVALID
if user_input[0] == '?':
return _help(user_input)
elif user_input[0].isnumeric():
return _select(user_input)
elif user_input[0] == '>':
return _goto(user_input)
else:
print("Invalid input: first char must be alphanumeric, '>' or '?'.")
return InputCat.INVALID
I was wondering if this is a readable and 'pythonic' way of doing this? If not, what would be better alternatives?
3 Answers 3
Don't print promiscuously. You've broken your problem down into several
small functions, which is a good general instinct. But you've undermined those
functions by polluting them with side effects -- printing in this case.
Whenever feasible, prefer program designs that rely mostly on data-oriented
functions (take data as input and return data as output, without causing side
effects). Try to consolidate your program's necessary side effects in just a
few places -- for example, in a top-level main()
function (or some
equivalent) that coordinates the interaction with the user.
Don't over-engineer. You have described a simple usage pattern: ?
for
help; N
for select; >N
for goto-page, and >S
for goto-player (where N
is an integer and S
is a string of letters). Validating that kind of input
could be done reasonably in various ways, but none of them require so much
scaffolding. The small demo below uses (INPUT_CAT, REGEX)
pairs. For more
complex input scenarios, you could use
callables (functions or classes) instead of regexes.
Resist the temptation for fine-grained dialogue with users. My impression
is that you want to give the user specific feedback when their input is
invalid. For example, if the user enters ?foo
, instead of providing a
general error message (eg, "Invalid input") you say something specific
("Invalid input: when using '?', no other characters are allowed."). That
specificity requires more coding on your part and reading/understanding on the
part of users as you pepper them with different flavors of invalid-reply
messages. But is it really worth it? I would suggest that the answer is no.
Instead of fussing with all of those details, just provide clear documentation
in your usage/help text. If a user provides bad input, tell them so in a
general way and, optionally, remind them how to view the usage text.
When feasible, let a data structure drive the algorithm rather than
logic. Your current implementation is based heavily on conditional
logic. With the right data structure (INPUT_RGXS
in the demo below),
the need for most of that logic disappears.
import sys
import re
from enum import Enum, auto
class InputCat(Enum):
HELP = auto()
SELECT = auto()
GOTO_PAGE = auto()
GOTO_PLAYER = auto()
INVALID = auto()
INPUT_RGXS = (
(InputCat.HELP, re.compile(r'^\?$')),
(InputCat.SELECT, re.compile(r'^\d+$')),
(InputCat.GOTO_PAGE, re.compile(r'^>\d+$')),
(InputCat.GOTO_PLAYER, re.compile(r'^>\w+$')), # Adjust as desired.
)
def main(args):
# Printing occurs only at the program's outer edges.
for user_input in args:
ic = get_input_category(user_input)
if ic == InputCat.INVALID:
print(f'Invalid reply: {user_input!r}')
else:
print(ic)
def get_input_category(user_input: str) -> InputCat:
# A data-oriented function.
for ic, rgx in INPUT_RGXS:
if rgx.search(user_input):
return ic
return InputCat.INVALID
if __name__ == '__main__':
main(sys.argv[1:])
-
5\$\begingroup\$ Bonus point: check all
INPUT_RGXS
rather than returning early, to detect overlapping regexes. \$\endgroup\$Matthieu M.– Matthieu M.2023年05月31日 07:38:06 +00:00Commented May 31, 2023 at 7:38 -
8\$\begingroup\$ "Resist the temptation for fine-grained dialogue with users" Gotta disagree on this one. It can be overdone of course, but in the OP's cases they look good. When convenient, an extra sentence to tell the user why the input is invalid is helpful to them and to you. If a user is entering an input they think should be valid and only getting "Invalid Input" then they're going to submit a support ticket. In OP's original code, they know exactly why it failed validation and there's no reason to hide that reason. \$\endgroup\$Carl Kevinson– Carl Kevinson2023年05月31日 16:24:05 +00:00Commented May 31, 2023 at 16:24
Your code is quite good, but there are two ways that it can be improved.
Single satisfication conditional checking
You use a lot of for
loops in which you check if at least one element meets a certain condition, like this:
for char in user_input[1:]:
if not (char.isalpha() or char in {' ', '-'}):
pass
While this is in no ways bad, it is a little bit inefficient and can cause unnecessary iterations when not short circuited, if the condition is met, it would be waste of execution time to check if the rest of the list meets the condition if you only want to know if any in the list meets the condition, so you can just stop the loop when the condition is met.
Here you use return
when the condition is met so that is not an issue. But if you want the code to continue and also short-circuit the loop, break
the loop.
However your loop does introduce unnecessary extra nesting levels, and it is generally a good idea to reduce indentation nestings.
Consider using any
for this purpose, it short-circuits when a match is found and only loops through the entire list if no match is found. It uses compiled C code and is more efficient than Python for loop. And it also reduces nestings thus more concise.
The syntax is:
if any(not char.isnumeric() for char in user_input):
do_something()
Error handling
You use a lot of structures like this:
print("Invalid input: when using '?', no other characters are allowed.")
return InputCat.INVALID
It is a weird design choice, a pattern I have never seen before. Not that it is bad, but it is very unPythonic.
So you want to print some text to inform the user that something went wrong, and stop code execution to prevent bugs.
Using exceptions is the most sensible thing to do, you need to raise Exception
, it informs the user and stops the code execution in one statement.
Not only that, it provides tracebacks to show its origin to help debugging your code, and it stops code execution of not only one function but also the entire script (technically it stops the execution of one thread but most Python programs are single-threaded anyway), thus preventing bugs that would happen if the code executes further.
And it has the nice red color to convey that something really bad happened.
Syntax:
raise ValueError("Invalid input: when using '?', no other characters are allowed.")
Note in this setup you won't be able to return InputCat.INVALID
, but this renders that obsolete anyways, so you don't really need that.
-
1\$\begingroup\$ Minor quibble:
any(not char.isnumeric() for char in user_input)
can just benot user_input.isnumeric()
\$\endgroup\$Jasmijn– Jasmijn2023年05月31日 12:23:18 +00:00Commented May 31, 2023 at 12:23 -
\$\begingroup\$ @Jasmijn Of course I knew that! I was trying to teach the OP thd syntax of
any
. \$\endgroup\$Ξένη Γήινος– Ξένη Γήινος2023年05月31日 12:40:25 +00:00Commented May 31, 2023 at 12:40
I’d suggest the following changes:
- Create a Validator class (the name is "Validator" is just an example).
- Move all the validation code into the Validator class.
- Separate out validation from executing user commands, and use the Validator for validation—it does that and only that, returning a list of all validation errors for display to the user as appropriate (printing, including as the message of a raised exception, etc.).
This will make your code much easier to follow.
raise ValueError(msg)
syntax, in this way you can inform the user what has gone wrong and immediately stop the code execution. It will replace your print + return combination. But by doing this you won't be able toreturn InputCat.INVALID
, though it might not be needed. This is a design choice but using exceptions is the more Pythonic method here. \$\endgroup\$