A regex pattern that matches all forms of integers and decimal numbers in Python

Question 1

The question is pretty self-explanatory. I wrote a regex pattern which should (in theory) match all possible integers and decimal numbers. The pattern is as follows:

import re
pattern = '^[+-]?((\d+(\.\d*)?)|(\.\d+))$'
re.compile(pattern)

How foolproof is this pattern? I tested out quite a few scenarios, and they all worked fine. Am I missing some edge case here? Thanks for any help.

Question 2

Your expression matches numbers with digits before the decimal point but no digits after the decimal point, e.g. 33.. Is this intentional?

Question 3

You regex fails on exponential notation numbers, like 1e15 and 1e-15

Question 4

A regex pattern without a clear spec or sufficient unit tests to see what should get matched and what not, is impossible to review.

Question 5

Your expression looks just fine, maybe we would slightly modify that to:

^[+-]?((\d+(\.\d+)?)|(\.\d+))$

for failing these samples, 3., 4., for instance, just in case maybe such samples might be undesired. Other than that, you have some capturing groups that I'm guessing you'd like to keep those.

Test the capturing groups with `re.finditer`

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
 
 print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
 
 for groupNum in range(0, len(match.groups())):
 groupNum = groupNum + 1
 
 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with `re.findall`

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
print(re.findall(regex, test_str, re.MULTILINE))

The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Question 6

3 is an integer, 3. is a float value. I would expect that latter string to match as well, so I would leave the \.\d* I modified.

Question 7

Here's one:

number_regex = re.compile(
 r'^[-+]?(?:(?:(?:[1-9](?:_?\d)*|0+(_?0)*)|(?:0[bB](?:_?[01])+)'
 r'|(?:0[oO](?:_?[0-7])+)|(?:0[xX](?:_?[0-9a-fA-F])+))'
 r'|(?:(?:(?:\d(?:_?\d)*)?(?:\.(?:\d(?:_?\d)*))|(?:\d(?:_?\d)*)\.)'
 r'|(?:(?:(?:\d(?:_?\d)*)|(?:(?:\d(?:_?\d)*)?(?:\.(?:\d(?:_?\d)*))'
 r'|(?:\d(?:_?\d)*)\.))(?:[eE][-+]?(?:\d(?:_?\d)*)))))$',
 re.UNICODE)

But seriously, Python numbers are complicated

If you really a regex that will match ALL valid forms of Python numbers, it will be a complex regex. Integers include decimal, binary, octal, and hexadecimal forms. Floating point numbers can be in exponent form. As of version 3.6 all kinds of numbers can have '_' in them, but it can't be first or last. And integers> 0 can't start with '0' unless it's 0b 0o or 0x

From the Python documentation, here is the BNF for integer:

integer ::= decinteger | bininteger | octinteger | hexinteger
decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger ::= "0" ("b" | "B") (["_"] bindigit)+
octinteger ::= "0" ("o" | "O") (["_"] octdigit)+
hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
bindigit ::= "0" | "1"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"

and here is the BNF for floatnumber:

floatnumber ::= pointfloat | exponentfloat
pointfloat ::= [digitpart] fraction | digitpart "."
exponentfloat ::= (digitpart | pointfloat) exponent
digitpart ::= digit (["_"] digit)*
fraction ::= "." digitpart
exponent ::= ("e" | "E") ["+" | "-"] digitpart

Note that the '+' or '-' isn't technically part of the number; it is a unary operator. But it is easy enough to include an optional sign in the regex.

To create the regex, simply translate the BNF into the corresponding regex patterns. Using non-grouping parenthesis (?: ) and f-strings helps a lot (rf"..." is a raw format string).

Integer:

decint = r"(?:[1-9](?:_?\d)*|0+(_?0)*)"
binint = r"(?:0[bB](?:_?[01])+)"
octint = r"(?:0[oO](?:_?[0-7])+)"
hexint = r"(?:0[xX](?:_?[0-9a-fA-F])+)"
integer = rf"(?:{decint}|{binint}|{octint}|{hexint})"

floatnumber:

digitpart = r"(?:\d(?:_?\d)*)"
exponent = rf"(?:[eE][-+]?{digitpart})"
fraction = rf"(?:\.{digitpart})"
pointfloat = rf"(?:{digitpart}?{fraction}|{digitpart}\.)"
exponentfloat = rf"(?:(?:{digitpart}|{pointfloat}){exponent})"
floatnumber = rf"(?:{pointfloat}|{exponentfloat})"

and put it all together, with an optional sign, to get:

number = re.compile(rf"^[-+]?(?:{integer}|{floatnumber})$")

Which is how I got the regex at the top of this answer. This has not been thoroughly tested, just spot checked:

tests = """
 0
 1
 123
 100_000
 1_2_3
 1000000
 1.0
 1.
 .2
 0.2
 3.4
 1_234.567_89
 0o123
 0b1111_0000
 0X12_34_ab_cd
 1e-10
 1E001
 .2e-2
"""
tests = tests.split()
for s in tests:
 m = number.match(s)
 print(f"'{s}' => {m[0] if m else 'NOT a number'}")

Emma Marcier Emma Marcier 3,7123 gold badges14 silver badges43 bronze badges · Accepted Answer · 2019-07-12 04:49:57Z

Your expression looks just fine, maybe we would slightly modify that to:

^[+-]?((\d+(\.\d+)?)|(\.\d+))$

for failing these samples, 3., 4., for instance, just in case maybe such samples might be undesired. Other than that, you have some capturing groups that I'm guessing you'd like to keep those.

Test the capturing groups with `re.finditer`

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
 
 print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
 
 for groupNum in range(0, len(match.groups())):
 groupNum = groupNum + 1
 
 print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Test with `re.findall`

import re
regex = r"^[+-]?((\d+(\.\d+)?)|(\.\d+))$"
test_str = ("0.00000\n"
 "0.00\n"
 "-200\n"
 "+200\n"
 "200\n"
 "200.2\n"
 "-200.2\n"
 "+200.2\n"
 ".000\n"
 ".1\n"
 ".2\n"
 "3.\n"
 ".")
print(re.findall(regex, test_str, re.MULTILINE))

The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

3 is an integer, 3. is a float value. I would expect that latter string to match as well, so I would leave the \.\d* I modified.

Stack Exchange Network

A regex pattern that matches all forms of integers and decimal numbers in Python

2 Answers 2

Test the capturing groups with `re.finditer`

Test with `re.findall`

RegEx Circuit

Here's one:

But seriously, Python numbers are complicated

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

A regex pattern that matches all forms of integers and decimal numbers in Python

2 Answers 2

Test the capturing groups with re.finditer

Test with re.findall

RegEx Circuit

Here's one:

But seriously, Python numbers are complicated

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Test the capturing groups with `re.finditer`

Test with `re.findall`