1
\$\begingroup\$

The question is pretty self-explanatory. I wrote a regex pattern which should (in theory) match all possible integers and decimal numbers. The pattern is as follows:

import re
pattern = '^[+-]?((\d+(\.\d*)?)|(\.\d+))$'
re.compile(pattern)

How foolproof is this pattern? I tested out quite a few scenarios, and they all worked fine. Am I missing some edge case here? Thanks for any help.

asked Jul 11, 2019 at 20:14
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Your expression matches numbers with digits before the decimal point but no digits after the decimal point, e.g. 33.. Is this intentional? \$\endgroup\$ Commented Jul 11, 2019 at 23:25
  • 2
    \$\begingroup\$ You regex fails on exponential notation numbers, like 1e15 and 1e-15 \$\endgroup\$ Commented Jul 12, 2019 at 4:59
  • 2
    \$\begingroup\$ A regex pattern without a clear spec or sufficient unit tests to see what should get matched and what not, is impossible to review. \$\endgroup\$ Commented Jul 12, 2019 at 5:27

2 Answers 2

5
\$\begingroup\$

Your expression looks just fine, maybe we would slightly modify that to:

^[+-]?((\d+(\.\d+)?)|(\.\d+))$

for failing these samples, 3., 4., for instance, just in case maybe such samples might be undesired. Other than that, you have some capturing groups that I'm guessing you'd like to keep those.


Test the capturing groups with re.finditer

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
 
 print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
 
 for groupNum in range(0, len(match.groups())):
 groupNum = groupNum + 1
 
 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with re.findall

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
print(re.findall(regex, test_str, re.MULTILINE))

The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

answered Jul 12, 2019 at 4:49
\$\endgroup\$
1
  • 2
    \$\begingroup\$ 3 is an integer, 3. is a float value. I would expect that latter string to match as well, so I would leave the \.\d* I modified. \$\endgroup\$ Commented Jul 12, 2019 at 5:05
5
\$\begingroup\$

Here's one:

number_regex = re.compile(
 r'^[-+]?(?:(?:(?:[1-9](?:_?\d)*|0+(_?0)*)|(?:0[bB](?:_?[01])+)'
 r'|(?:0[oO](?:_?[0-7])+)|(?:0[xX](?:_?[0-9a-fA-F])+))'
 r'|(?:(?:(?:\d(?:_?\d)*)?(?:\.(?:\d(?:_?\d)*))|(?:\d(?:_?\d)*)\.)'
 r'|(?:(?:(?:\d(?:_?\d)*)|(?:(?:\d(?:_?\d)*)?(?:\.(?:\d(?:_?\d)*))'
 r'|(?:\d(?:_?\d)*)\.))(?:[eE][-+]?(?:\d(?:_?\d)*)))))$',
 re.UNICODE)

But seriously, Python numbers are complicated

If you really a regex that will match ALL valid forms of Python numbers, it will be a complex regex. Integers include decimal, binary, octal, and hexadecimal forms. Floating point numbers can be in exponent form. As of version 3.6 all kinds of numbers can have '_' in them, but it can't be first or last. And integers> 0 can't start with '0' unless it's 0b 0o or 0x

From the Python documentation, here is the BNF for integer:

integer ::= decinteger | bininteger | octinteger | hexinteger
decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger ::= "0" ("b" | "B") (["_"] bindigit)+
octinteger ::= "0" ("o" | "O") (["_"] octdigit)+
hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
bindigit ::= "0" | "1"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"

and here is the BNF for floatnumber:

floatnumber ::= pointfloat | exponentfloat
pointfloat ::= [digitpart] fraction | digitpart "."
exponentfloat ::= (digitpart | pointfloat) exponent
digitpart ::= digit (["_"] digit)*
fraction ::= "." digitpart
exponent ::= ("e" | "E") ["+" | "-"] digitpart

Note that the '+' or '-' isn't technically part of the number; it is a unary operator. But it is easy enough to include an optional sign in the regex.

To create the regex, simply translate the BNF into the corresponding regex patterns. Using non-grouping parenthesis (?: ) and f-strings helps a lot (rf"..." is a raw format string).

Integer:

decint = r"(?:[1-9](?:_?\d)*|0+(_?0)*)"
binint = r"(?:0[bB](?:_?[01])+)"
octint = r"(?:0[oO](?:_?[0-7])+)"
hexint = r"(?:0[xX](?:_?[0-9a-fA-F])+)"
integer = rf"(?:{decint}|{binint}|{octint}|{hexint})"

floatnumber:

digitpart = r"(?:\d(?:_?\d)*)"
exponent = rf"(?:[eE][-+]?{digitpart})"
fraction = rf"(?:\.{digitpart})"
pointfloat = rf"(?:{digitpart}?{fraction}|{digitpart}\.)"
exponentfloat = rf"(?:(?:{digitpart}|{pointfloat}){exponent})"
floatnumber = rf"(?:{pointfloat}|{exponentfloat})"

and put it all together, with an optional sign, to get:

number = re.compile(rf"^[-+]?(?:{integer}|{floatnumber})$")

Which is how I got the regex at the top of this answer. This has not been thoroughly tested, just spot checked:

tests = """
 0
 1
 123
 100_000
 1_2_3
 1000000
 1.0
 1.
 .2
 0.2
 3.4
 1_234.567_89
 0o123
 0b1111_0000
 0X12_34_ab_cd
 1e-10
 1E001
 .2e-2
"""
tests = tests.split()
for s in tests:
 m = number.match(s)
 print(f"'{s}' => {m[0] if m else 'NOT a number'}")
answered Jul 12, 2019 at 7:53
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.