3
\$\begingroup\$

would like to know if there is a better approach in doing a telephone/mobile number regular expression for specific country. this includes naming variables like how to know if format is international format (is it with + sign?), a national format (is it the area code enclosed in parenthesis) or a local format (no + sign, and no area code), currently they are labelled as country_name_telephone/country_name_mobile or country_name_telephone_N_digit, etc...

below are the regular expression for some country.

/** matches the following pattern:
 * - 01-111-11-11
 * - 01 111 11 11
 * - 011-11-11-11
 * - 011 11 11 11
 * - 011111111
 */
public final static String CTRY_BELGIUM_TELEPHONE = "^0(\\d{8}|\\d\\s\\d{3}\\s\\d\\d\\s\\d\\d|\\d-\\d{3}-\\d\\d-\\d\\d|\\d\\d\\s\\d\\d\\s\\d\\d\\s\\d\\d|\\d\\d-\\d\\d-\\d\\d-\\d\\d)$";
/** matches the following pattern:
 * - 0412-34-56-78
 * - 0412 34 56 78
 * - 0412-345-678
 * - 0412 345 678
 * - 0412345678
 */
public final static String CTRY_BELGIUM_MOBILE = "^04\\d\\d(\\d{6}|\\s\\d\\d\\s\\d\\d\\s\\d\\d|\\s\\d{3}\\s\\d{3}|-\\d\\d-\\d\\d-\\d\\d|-\\d{3}-\\d{3})$";
/** matches the following pattern:
 * - 0000-0000
 * - 0000 0000
 * - 00000000
 */
public final static String CTRY_HONG_KONG_TELEPHONE = "^(\\d{4}[-\\s]?\\d{4})$";
/** matches the following pattern:
 * - +852-0000-0000
 * - +852 0000 0000
 * - +85200000000
 */
public final static String CTRY_HONG_KONG_MOBILE = "^\\+852(\\d{8}|-\\d{4}-\\d{4}|\\s\\d{4}\\s\\d{4})$";
/** matches the following pattern:
 * - 212-0000
 * - 212 0000
 * - 2120000
 */
public final static String CTRY_UNITED_STATES_TELEPHONE_7_DIGIT = "^[2-9]((?!11)\\d{2})[-\\s]*\\d{4}$";
/** matches the following pattern:
 * - 200 212 0000
 * - 200-212-0000
 * - 2002120000
 */
public final static String CTRY_UNITED_STATES_TELEPHONE_10_DIGIT = "^[2-9]\\d{2}((-[2-9]((?!11)\\d{2})-)|(\\s[2-9]((?!11)\\d{2})\\s)|([2-9]((?!11)\\d{2})))\\d{4}$";
/** matches the following pattern:
 * - all matches in CTRY_UNITED_STATES_TELEPHONE_7_DIGIT and CTRY_UNITED_STATES_TELEPHONE_10_DIGIT
 */
public final static String CTRY_UNITED_STATES_TELEPHONE = "("+CTRY_UNITED_STATES_TELEPHONE_7_DIGIT+")|("+CTRY_UNITED_STATES_TELEPHONE_10_DIGIT+")";
/** matches the following pattern:
 * - +1 200 212 0000
 * - +1-200-212-0000
 * - +12002120000
 */
public final static String CTRY_UNITED_STATES_MOBILE = "^\\+1((-[2-9]\\d{2}-[2-9]((?!11)\\d{2})-)|(\\s[2-9]\\d{2}\\s[2-9]((?!11)\\d{2})\\s)|([2-9]\\d{2}[2-9]((?!11)\\d{2})))\\d{4}$";
asked Mar 31, 2017 at 9:36
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

For CTRY_BELGIUM_TELEPHONE:

First, to understand what the regex is doing, I'll add the railroad diagram

RegEx RailRoad Diagram

Demo on RegEx101

What it matches:

Zero followed by

  1. Eight Digits
  2. Digit-Space-Three_Digits-Space-Two_Digits-Space-Two_Digits
  3. Digit-Hyphen-Three_Digits-Hyphen-Two_Digits-Hyphen-Two_Digits
  4. Two_Digits-Space-Two_Digits-Space-Two_Digits-Space-Two_Digits
  5. Two_Digits-Hyphen-Two_Digits-Hyphen-Two_Digits-Hyphen-Two_Digits

Now, looking at above, we can easily spot that many of the things are use multiple times in the same sub-pattern.

Can we combine/reuse them?

For simplicity, I'll use single backslash from here onward. You need to escape it by prepending another backslash to escape it in string.

When I say nth road, it'll refer to the nth path in the above diagram.

Review:

The first road is straight forward. Match eight digits. Simple!

The second road, \d\s\d{3}\s\d\d\s\d\d. As you've used \d{3}, similarly we can write \d\d as \d{2}. Cool!

It'll now become \d\s\d{3}\s\d{2}\s\d{2}.

The third road similarly can be written as \d-\d{3}-\d{2}-\d{2}.

For fourth and fifth road, same can be used.

Second and Third roads are similar except the separator used between digits. Is there any way to combine them?

Back-reference.

How can back-reference be used here?

I'll show you: The character should be captured in capturing group and refer it with back-reference whenever needed.

\d([\s-])\d{3}1円\d{2}1円\d{2}

The regex here can be used to match both second and third road.

  1. \d: Match a digit
  2. ([\s-]): Match either a space character or hyphen and add it into first capturing group
  3. \d{n}: Match digits n times.
  4. 1円: Match whatever is in first captured group(see #2 above)

There still is repetition in 1円\d{2}1円\d{2}. This can be added in group and repeated twice (?:1円\d{2}). Since, back-reference is used, it's not better to use non-capturing group.(You'll know why in the 4th and 5th road).

Similarly, for fourth and fifth road, the regex will be

\d{2}([\s-])\d{2}(?:1円\d{2}){2}

At this stage, the regex will become

^0(?:\d{8}|\d([\s-]?)\d{3}(?:1円\d{2}){2}|\d{2}([\s-])\d{2}(?:2円\d{2}){2})$

The first part \d{8} can also be removed by using ? quantifier on the first captured group. By doing this, we're making the [\s-] optional and thus 1円 will be empty and will match eight digits with no spaces between them.

So, Final RegEx is

/^0(?:\d([\s-]?)\d{3}(?:1円\d{2}){2}|\d{2}([\s-])\d{2}(?:2円\d{2}){2})$/

Here's railroad diagram for this

New RegEx Railroad

Note that 1 and 2 in the diagram refer to respective capturing group.

Demo on RegEx101

Although, this answer only explains about one RegEx, other RegExes can be modified in the similar way.

answered Mar 31, 2017 at 12:12
\$\endgroup\$
3
  • \$\begingroup\$ Hi, Thank you very much for the detailed explanation, and for introducing the Back-reference, the other regex can greatly improve. \$\endgroup\$ Commented Apr 1, 2017 at 1:48
  • \$\begingroup\$ Out of curiosity, from which site did you generate that railroad diagram? cannot find it in regex101. \$\endgroup\$ Commented Apr 1, 2017 at 2:23
  • \$\begingroup\$ @vims I've used atom plugin to generate diagram. \$\endgroup\$ Commented Apr 1, 2017 at 3:11

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.