Problem
Validate if a given string can be interpreted as a decimal or scientific number.
Some examples:
"0" => true
" 0.1 " => true
"abc" => false
"1 a" => false
"2e10" => true
" -90e3 " => true
" 1e" => false
"e3" => false
" 6e-1" => true
" 99e2.5 " => false
"53.5e93" => true
" --6 " => false
"-+3" => false
"95a54e53" => false
Code
I've solved the valid number LeetCode problem using Python re
module. If you'd like to review the code and provide any change/improvement recommendations, please do so and I'd really appreciate that.
import re
from typing import Optional
def is_numeric(input_string: Optional[str]) -> bool:
"""
Returns True for valid numbers and input string can be string or None
"""
if input_string is None:
return False
expression_d_construct = r"^[+-]?(?:\d*\.\d+|\d+\.\d*|\d+)[Ee][+-]?\d+$|^[+-]?(?:\d*\.\d+|\d+\.\d*|\d+)$|^[+-]?\d+$"
expression_char_class = r"^[+-]?(?:[0-9]*\.[0-9]+|[0-9]+\.[0-9]*|[0-9]+)[Ee][+-]?[0-9]+$|^[+-]?(?:[0-9]*\.[0-9]+|[0-9]+\.[0-9]*|[0-9]+)$|^[+-]?[0-9]+$"
if re.match(expression_d_construct, input_string.strip()) is not None and re.match(expression_char_class, input_string.strip()) is not None:
return True
return False
if __name__ == "__main__":
# ---------------------------- TEST ---------------------------
DIVIDER_DASH = '-' * 50
GREEN_APPLE = '\U0001F34F'
RED_APPLE = '\U0001F34E'
test_input_strings = [None, "0 ", "0.1", "abc", "1 a", "2e10", "-90e3",
"1e", "e3", "6e-1", "99e2.5", "53.5e93", "--6", "-+3", "95a54e53"]
count = 0
for string in test_input_strings:
print(DIVIDER_DASH)
if is_numeric(string):
print(f'{GREEN_APPLE} Test {int(count + 1)}: {string} is a valid number.')
else:
print(f'{RED_APPLE} Test {int(count + 1)}: {string} is an invalid number.')
count += 1
Output
--------------------------------------------------
π Test 1: None is an invalid number.
--------------------------------------------------
π Test 2: 0 is a valid number.
--------------------------------------------------
π Test 3: 0.1 is a valid number.
--------------------------------------------------
π Test 4: abc is an invalid number.
--------------------------------------------------
π Test 5: 1 a is an invalid number.
--------------------------------------------------
π Test 6: 2e10 is a valid number.
--------------------------------------------------
π Test 7: -90e3 is a valid number.
--------------------------------------------------
π Test 8: 1e is an invalid number.
--------------------------------------------------
π Test 9: e3 is an invalid number.
--------------------------------------------------
π Test 10: 6e-1 is a valid number.
--------------------------------------------------
π Test 11: 99e2.5 is an invalid number.
--------------------------------------------------
π Test 12: 53.5e93 is a valid number.
--------------------------------------------------
π Test 13: --6 is an invalid number.
--------------------------------------------------
π Test 14: -+3 is an invalid number.
--------------------------------------------------
π Test 15: 95a54e53 is an invalid number.
RegEx Circuit
jex.im visualizes regular expressions:
RegEx Demo 1
RegEx Demo 2
If you wish to explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
Source
-
4\$\begingroup\$ nice apple icons \$\endgroup\$RomanPerekhrest– RomanPerekhrest2019εΉ΄10ζ16ζ₯ 15:25:43 +00:00Commented Oct 16, 2019 at 15:25
3 Answers 3
Instead of diving into cumbersome and lengthy regex expressions consider the following improvement/correction:
The main thesis for the underlying aspect is:
Numeric literals containing a decimal point or an exponent sign yield floating point numbers.
https://docs.python.org/3.4/library/stdtypes.html#numeric-types-int-float-complex
Therefore Python treats values like 53.5e93
, -90e3
as float type numbers.
Eventually I would proceed with the following approach (retaining those cute icons) including additional small optimizations:
from typing import TypeVar, Optional
def is_numeric(input_string: Optional[str]) -> bool:
"""
Returns True for valid numbers. Acceptable types of items: str or None
"""
if input_string is None:
return False
try:
input_string = input_string.strip()
float(input_string)
except ValueError:
return False
return True
if __name__ == "__main__":
# ---------------------------- TEST ---------------------------
DIVIDER_DASH = '-' * 50
GREEN_APPLE = '\U0001F34F'
RED_APPLE = '\U0001F34E'
test_input_strings = [None, "0 ", "0.1", "abc", "1 a", "2e10", "-90e3",
"1e", "e3", "6e-1", "99e2.5", "53.5e93", "--6", "-+3", "95a54e53"]
count = 0
for string in test_input_strings:
print(DIVIDER_DASH)
count += 1
if is_numeric(string):
print(f'{GREEN_APPLE} Test {count}: `{string}` is a valid number.')
else:
print(f'{RED_APPLE} Test {count}: `{string}` is not a valid number.')
The output:
--------------------------------------------------
π Test 1: `None` is not a valid number.
--------------------------------------------------
π Test 2: `0 ` is a valid number.
--------------------------------------------------
π Test 3: `0.1` is a valid number.
--------------------------------------------------
π Test 4: `abc` is not a valid number.
--------------------------------------------------
π Test 5: `1 a` is not a valid number.
--------------------------------------------------
π Test 6: `2e10` is a valid number.
--------------------------------------------------
π Test 7: `-90e3` is a valid number.
--------------------------------------------------
π Test 8: `1e` is not a valid number.
--------------------------------------------------
π Test 9: `e3` is not a valid number.
--------------------------------------------------
π Test 10: `6e-1` is a valid number.
--------------------------------------------------
π Test 11: `99e2.5` is not a valid number.
--------------------------------------------------
π Test 12: `53.5e93` is a valid number.
--------------------------------------------------
π Test 13: `--6` is not a valid number.
--------------------------------------------------
π Test 14: `-+3` is not a valid number.
--------------------------------------------------
π Test 15: `95a54e53` is not a valid number.
-
\$\begingroup\$ The
.strip()
part is not necessary, because python allows optional leading and trailing whitespace. Also, you can omit theNone
check and catch bothValueError
andTypeError
. \$\endgroup\$Wombatz– Wombatz2019εΉ΄10ζ17ζ₯ 11:03:42 +00:00Commented Oct 17, 2019 at 11:03
I'd just go with @Roman's suggestion. You should just leave it up to the language to decide what is and isn't valid.
I'd make two further suggestions though:
I don't think the parameter to is_numeric
should be Optional
; either conceptually, or to comply with the challenge. None
will never be a valid number, so why even check it? I don't think dealing with invalid data should be that function's responsibility. Make it take just a str
, then deal with None
s externally. I also don't really think it's is_numeric
's responsibility to be dealing with trimming either; and that isn't even required:
print(float(" 0.1 ")) # prints 0.1
I'd also return True
from within the try
. The behavior will be the same, but I find it makes it clearer the intent of the try
.
After the minor changes, I'd go with:
def is_numeric(input_string: str) -> bool:
"""
Returns True for valid numbers. Acceptable types of items: str or None
"""
try:
parsed = float(input_string)
return True
except ValueError:
return False
if string is not None and is_numeric(string):
print(f'{GREEN_APPLE} Test {count}: `{string}` is a valid number.')
else:
print(f'{RED_APPLE} Test {count}: `{string}` is not a valid number.')
That regex visualisation you provided is really neat. It shows that there is a lot of potential overlap in the conditions.
You should be able to reduce it down to something similar to this:
^[+-]?\d+(\.\d+)?([Ee][+-]?\d+)?$
-
1\$\begingroup\$ Your regex doesn't match
1.
or.5
, both of which are matched by the regex in the OP. Your regex assumes that there will always be a leading digit before a period, and that a period will always be followed by decimals. Neither is true. \$\endgroup\$JAD– JAD2019εΉ΄10ζ17ζ₯ 08:39:15 +00:00Commented Oct 17, 2019 at 8:39