I made an implementation of atoi (ascii to integer) in Python a while ago for fun, and I'd like to know what I could do to improve it.
class CannotConvertToInteger(Exception):
"""A non-numeric character was present in the string passed to atoi"""
pass
def atoi(string : str) -> int:
sign = multiplier = 1
val = 0
if string[0] == '-':
string = string[1:]
sign = -1
elif string[0] == '+':
string = string[1:]
for i in string[::-1]:
code = ord(i)
try:
if ((code > 57) or (code < 48)):
raise CannotConvertToInteger
else:
val += (code - 48) * multiplier
multiplier *= 10
except CannotConvertToInteger:
return print('Cannot convert string to an integer!')
return (val * sign)
test_string = input('Enter an optionally signed integer: ')
result = atoi(test_string)
if result:
print('It was a valid int! atoi() returned:', result)
else:
print('It was an invalid int! atoi() returned:', result)
input()
One specific question I have is if it's bad practice to return a print call, as a way to print an error and simultaneously return from the function? I did that so that I could print the error and return None
on the same line.
2 Answers 2
Python already has an exception that denotes that the value you passed is inappropriate somehow. It is
ValueError
, which is what the built-inint
also raises if a wrong string is passed.In addition, defining a nice readable error which you can raise, only to catch it directly within the function and to return
None
(the output ofprint
) and print to the terminal is not ideal. Just let the exception rise to the caller of the function, it should be their problem if the function is used in a bad way (which you are telling them all about with that exception).You should avoid magic constants. What are
57
and48
? Either give them names and use compound comparisons:zero = ord("0") nine = ord("9") if not zero <= code <= nine: ...
Or, maybe even better, write a
isdigit
function:def isdigit(s): return s in set("0123456789")
Which can be slightly sped up by using the standard library
string
module:from string import digits DIGITS = set(digits) def isdigit(s): return s in DIGITS
Incidentally, don't shadow the standard library
string
module, just call the inputs
orx
, asint
does.Note that there are also
str.isdigit
, but this unfortunately also returns true for unicode digits, such as all of1234567890
. Only with a whitelist can you fully control what counts as a digit in your case.Instead of iterating over the string and directly calling
ord
, you can usemap
(andreversed
):for code in map(ord, reversed(string)): ...
You could also iterate over
multiplier
(or rather its exponent) at the same time usingenumerate
:for exponent, code in enumerate(map(ord, reversed(string))): ... value += (code - zero) * 10 ** exponent
Actually directly manipulating ASCII values is not the most robust (although it works). Instead you could just make a dictionary that maps strings to integer values:
VALUES = {c: d for d, c in enumerate(DIGITS)}
Using
string[0]
to check for a sign character can fail if the empty string is passed. Instead you can usestr.startswith("+")
or evenstr.startswith(("+", "-"))
. This will just returnFalse
for an empty string.With all this done, your function can actually easily be extended to arbitrary bases (but let's stick to maximum base 36, like
int
, i.e. all digits and lowercase letters):DIGITS = string.digits + string.ascii_lowercase VALUES = {c: d for d, c in enumerate(DIGITS)} def isdigit(s, base=10): return s in DIGITS[:base] def atoi(x : str, base : int = 10): if not 2 <= base <= 36: raise ValueError("Only 2 <= base <= 36 currently supported") sign = 1 if x.startswith(("+", "-")): if x[0] == "-": sign = -1 x = x[1:] value = 0 for exp, c in enumerate(reversed(x)): if c not in VALUES or VALUES[c] >= base: raise ValueError(f"{c} is not a valid digit in base {base}") value += VALUES[c] * base ** exp return sign * value
This works, as demonstrated below:
atoi("12345") # 12345 atoi("+12345") # 12345 atoi("-12345") # -12345 atoi("12345", base=6) # 1865 atoi("12345", base=5) # ValueError: 5 is not a valid digit in base 5 atoi("101010", base=2) # 42 atoi("1234567890abcdef", base=16) # 1311768467294899695 atoi("1234567890abcdefghijklmnopqrstuvwxyz", base=36) # 3126485650002806059265235559620383787531710118313327355 atoi("") # 0 atoi("111", base=1) # ValueError: Only 2 <= base <= 36 currently supported atoi("Az", base=62) # ValueError: Only 2 <= base <= 36 currently supported
You should surround your calling code with a
if __name__ == "__main__":
guard to allow importing from this module from another script without the user input/output being run:if __name__ == "__main__": x = input('Enter an optionally signed integer: ') try: print('It was a valid int! atoi() returned:', atoi(x)) except ValueError: print('It was an invalid int!)
-
3\$\begingroup\$ Damn! I thought about your first two bullet points literally the first second I looked at this, but then forgot to include it in my answer ;-) \$\endgroup\$AlexV– AlexV2019年09月07日 08:06:21 +00:00Commented Sep 7, 2019 at 8:06
Error handling
One specific question I have is if it's bad practice to return a print call, as a way to print an error and simultaneously return from the function? I did that so that I could print the error and return None on the same line.
That is unconventional to say the least. By catching the exception internally and print it to the console you take away the ability to handle exceptions in calling code. You should raise an exception if an error occured that cannot be handled by the function itself to give the caller the possibility to decide how to handle this. Also, ask yourself: What is the advantage of your chosen approach? Is
result = atoi(test_string) if result: print('It was a valid int! atoi() returned:', result) else: print('It was an invalid int! atoi() returned:', result)
really any better than, for example:
try:
result = atoi(test_string)
print('It was a valid int! atoi() returned:', result)
except CannotConvertToInteger:
print('It was an invalid int!)
The code itself
- The function would profit from some blank lines to seperate logical blocks.
string[::-1]
actually creates a copy, since strings are immutable. You can avoid that by usingreversed(string)
, which is perfectly fine for your use-case since you only want the single digits, not the whole thing reversed.This convoluted structure
try: if ((code > 57) or (code < 48)): raise CannotConvertToInteger else: val += (code - 48) * multiplier multiplier *= 10 except CannotConvertToInteger: return print('Cannot convert string to an integer!')
is the price you pay for the way you chose to handle your error cases. As said above, removing the
try: ... catch ...:
is the favorable approach here.@Graipher's answer has more good points about using built-in exceptions and avoiding magical numbers, that I almost immediately thought about when writing this up, but then forgot in the process.
Almost as a side note: you don't need parens around conditions in Python. Most people only ever use them if the conditions get very long and need to spread over multiple lines. The same goes for the return value. Here the parens are even more unnecessary.
- You should have a look at the official Style Guide for Python Code (often just called PEP8). The recommendations most relevant to your code would be to use 4 spaces per indentation level and avoid multiple initializations per source line. The meta-site for Code Review also has a nice list of tools that can help you check this automatically.
-
2\$\begingroup\$ Even though I didn’t choose your answer, it’s still a really good answer and I very much appreciate the time and thought you put into it. :) \$\endgroup\$Confettimaker– Confettimaker2019年09月07日 16:48:56 +00:00Commented Sep 7, 2019 at 16:48
Explore related questions
See similar questions with these tags.
0
. (Note: this is not a problem in youratoi
function, but rather in the testing code below the function) \$\endgroup\$