I've been working on my own phone number extraction snippet of Regex code and I'd appreciate your feedback on it. I've tried to make it match as many different types of numbers as I knew existed in the US. (I'm sure there's some I missed.) Is there anything I can do to improve this?
(?:\+?(\d{1})?-?\(?(\d{3})\)?[\s-\.]?)?(\d{3})[\s-\.]?(\d{4})[\s-\.]?
Here's a picture of it in action:
enter image description here
You can test the code out here.
Also for this interested, here is the flow of the "logic":
enter image description here
(Made from pasting the code into here)
1 Answer 1
Interesting proposal! I could hardly read the expression that you've been working with, so I started from scratch and have built the following:
\D?(\d{0,3}?)\D{0,2}(\d{3})?\D{0,2}(\d{3})\D?(\d{4})$
I tried to take out some of the conditional and capturing groups to clear it up. So we can split this into four part:
\D?(\d{0,3}?)
Is there anything that isn't a number? it's probably a
+
or something so we'll make sure not to capture that. Then we look for a country code. You had only a check for a single digit country code, and if that's what you want, fine. However, know that some nations have codes more than one character in length.\D{0,2}(\d{3})?
Is there a dash or a parenthesis? Or both? If so, count those out. Is there an area code? If so, capture it.
\D{0,2}(\d{3})
Same as the previous section, except we're expecting to find those 3 digits, as they're necessary.
\D?(\d{4})$
We expect only one non-digit character between the 3 digit and 4 digit sections. We need to pull the matching to the right with the
$
. This way we make sure to get the very last digit in the phone number.
I'm sure there are many ways to do this, and this is only my interpretation. Have you tried Googling this to see if there's an established expression?
'1-800)-123-4567'
, which may be undesirable \$\endgroup\$