8
\$\begingroup\$

I have created this regular expression to validate usernames which I need in my projects:

^(?=.{3,32}$)(?!.*[._-]{2})(?!.*[0-9]{5,})[a-z](?:[\w]*|[a-z\d\.]*|[a-z\d-]*)[a-z0-9]$

It works just fine. But I'm wondering if there is any improvement and optimization for it, since I'm not exactly a regex-guy.

The regex and tests are available here.

Rules are:

  • usernames should start with [a-z]
  • usernames should end with [a-z0-9]
  • usernames can have a length between 3 and 32
  • usernames can contain any of [a-z0-9\._-]
  • Numbers should not be in the vicinity of each other more than 4 times. I mean p1234 is a match and p12345 is not.
  • each username can contains only one of [\._-]. I mean a username can contain . or - or _
  • each ., -, and _ should be followed by an alpha-numeric. I mean a . can not be followed by another .. They should not be in the vicinity of each other.

Tests:

j1vad-amiry match
j1vad-ami-ry match
ja23d_am8ry match
ja_23d_am8ry match
jav5d2.am3y match
jav.ad.amiry match
jav.ad.ami.ry.2 match
ja3fd4 match
page2491 match
page24915 not match
jav-ad_amiry not match
javad_am-iry not match
jav.ami-ry not match
jav.ami_ry not match
jav.ami__ry not match
2jav not match
2jav_ad not match
2jav_ad3 not match
unor
2,67315 silver badges24 bronze badges
asked Jul 2, 2014 at 2:21
\$\endgroup\$
5
  • 5
    \$\begingroup\$ Do you have to solve this problem with a regex only? Although it might be possible to solve this with a regex, it seems like a lot of work and the result is pretty much unreadable. Can you write some code in C# to check those rules instead? \$\endgroup\$ Commented Jul 2, 2014 at 2:32
  • \$\begingroup\$ @GregHewgill Well, using regex give me an option to check usernames in both server and client side with minimum coding. \$\endgroup\$ Commented Jul 2, 2014 at 2:34
  • 3
    \$\begingroup\$ Maybe, but you would have to validate your solution carefully on both sides because not all regex languages are created alike. \$\endgroup\$ Commented Jul 2, 2014 at 2:36
  • 1
    \$\begingroup\$ Your last edit invalidated part of an answer. I have rolled it back. Please see this meta post about why we frown on editing code in questions. \$\endgroup\$ Commented Jul 2, 2014 at 11:28
  • \$\begingroup\$ Would just like to say that Regex (or username in the title) are not pieces of code and shouldn't be stylicized as such. \$\endgroup\$ Commented Jul 2, 2014 at 14:12

2 Answers 2

18
\$\begingroup\$

I would very strongly recommend against using a regular expression for this. There is no clear mapping between the list of requirements you posted and the code.

Imagine another developer looking at this. Are they able to deduce the list of requirements? Given the list of requirements, are they able to verify that the regular expression is correct? How long would it take to convince them that it's correct?

In fact, it's not correct. The character class \d is not the same as [0-9] (unless you specify ECMAScript-compliant behaviour), and your regexp matches the username j1vad-a٠miry (that's an ARABIC-INDIC DIGIT ZERO). (It doesn't match on the link you posted, but that's a Ruby, not .NET, regexp tester.)

Now suppose one of your requirements changes -- this code is going to be tricky to maintain.

Instead, I would recommend writing a series of tests, each corresponding to one of your requirements. Here is the code I came up with. It still uses regular expressions where appropriate, but very simple ones.

public static bool IsValidUsername(string username)
{
 if (username == null)
 {
 return false;
 }
 var length = username.Length;
 if (length < 3 || length > 32)
 {
 return false;
 }
 if (!IsLowerAlpha(username[0]))
 {
 return false;
 }
 if (!IsLowerAlphanumeric(username[length - 1]))
 {
 return false;
 }
 if (!Regex.IsMatch(username, "^[a-z0-9._-]*$"))
 {
 return false;
 }
 if (Regex.IsMatch(username, "[0-9]{5,}"))
 {
 return false;
 }
 // Each username can contain only one of '.', '_', '-'.
 var punctuation = new [] { '.', '_', '-' };
 if (punctuation.Count(c => username.Contains(c)) > 1)
 {
 return false;
 }
 // Each '.', '_', and '-' should be followed by an alpha-numeric.
 for (var i = 0; i < length - 1; i++)
 {
 if (punctuation.Contains(username[i]) && !IsLowerAlphanumeric(username[i + 1]))
 {
 return false;
 }
 }
 return true;
}
private static bool IsLowerAlpha(char c)
{
 return c >= 'a' && c <= 'z';
}
private static bool IsLowerAlphanumeric(char c)
{
 return IsLowerAlpha(c) || (c >= '0' && c <= '9');
}

Now, IsValidUsername is getting a bit long. We could split out each check into a separate static method, for instance,

private static bool IsNotNull(string username)
{
 return username != null;
}
private static bool IsInLengthRange(string username)
{
 var length = username.Length;
 return length >= 3 && length <= 32;
}

And then just check our rules like this:

private static readonly Predicate<string>[] Rules = new Predicate<string>[]
{
 IsNotNull,
 IsInLengthRange,
 StartsWithLowerAlpha,
 ...
};
public static bool IsValidUsername(string username)
{
 return Rules.All(rule => rule(username));
}
answered Jul 2, 2014 at 3:01
\$\endgroup\$
6
\$\begingroup\$

Using lookarounds is going to force your regular expression to execute slowly. They will cause the matching engine to start iterating over the string only to restart once the assertion passes.

You should take a step back and think about how much is being put into this single expression. Checking the length of a string is trivial in any programming language. In many languages, it is O(1) since the length is stored with the character values. Having to explicitly write this check for client and server code does not require extensive extra coding.

Taking another step back. Do you really need to be this restrictive with the user name? If the only valid characters are [a-z0-9\._-], you aren't bumping up against escaping requirements. It took me a few tries to fully understand the rules you had spelled out. You shouldn't require your user to work that hard just to come up with a user name.

Use 0-9 or \d, not both. Consistency is always important.

answered Jul 2, 2014 at 3:03
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Good point about [0-9] and \d -- it jogged my memory that they're not actually the same. \$\endgroup\$ Commented Jul 2, 2014 at 4:21
  • \$\begingroup\$ @mjolka: The joys of regex being far from standard across languages. \$\endgroup\$ Commented Jul 2, 2014 at 12:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.