I have a small function which I regularly use to check if a variable contains only [a-z][0-9] and the special chars '-' and '_'. Currently I'm using the following:
function is_clean($string){
$pattern = "/([a-z]|[A-Z]|[0-9]|-|_)*/";
preg_match($pattern, $string, $return);
$pass = (($string == $return[0]) ? TRUE : FALSE);
return $pass;
}
1 - Can anyone tell me if this is the best/most efficient way to do this and if not try to explain why?
2 - When I view this function through my IDE I get a warning that $return is uninitialized... should I be initializing variables in advance with php, if so how and why?
2 Answers 2
No need for all that. Just use one bracket group, negate it (be prepending a ^
), and use the return value directly:
function is_clean ($string) {
return ! preg_match("/[^a-z\d_-]/i", $string);
}
Here's a quote from the PHP docs:
Return Values
preg_match()
returns1
if the pattern matches given subject,0
if it does not, orFALSE
if an error occurred.
In the regex above, we're looking for any characters in the string that are not in the bracket group. If none are found, preg_match
will return 0 (which when negated will result in true
). If any of those characters are found, 1
will be returned and negated to false
.
-
\$\begingroup\$ Certainly cleaner looking code with this approach - thanks. \$\endgroup\$SwiftD– SwiftD2013年03月15日 13:31:11 +00:00Commented Mar 15, 2013 at 13:31
-
\$\begingroup\$ Did some tests with a few different strings,
preg_match('@[^a-z\d-_]@i', $string)
is consistently slower thanpreg_match('@^[a-z\d-_]+$@i', $string)
. \$\endgroup\$datasn.io– datasn.io2014年10月10日 13:34:23 +00:00Commented Oct 10, 2014 at 13:34
Just an other method without regex.
function is_clean ($string) {
{
return ctype_alnum(str_replace(array('-', '_'), '', $input);
}
(削除) Maybe I find the time later this day to compare the performance, but I guess 'efficient' in your question was related to the code not the execution time? (削除ここまで) Letharion did the work for me :)
-
1\$\begingroup\$ I spent some time with this yesterday. ctype alone is fastest, then comes short preg_match, ctype with str_replace, and finally the original. In many cases, ctype should be the best way to do this, as it will, unlike the regexp, work with characters like é. However, I didn't post an answer, because I couldn't get it to behave properly when trying out it. \$\endgroup\$Letharion– Letharion2013年03月14日 07:37:32 +00:00Commented Mar 14, 2013 at 7:37
-
\$\begingroup\$ Maybe we should combine both approaches. Only if the plain ctype fails we use the regex. That might result in the best performance in the average if we can assume that there are not to many strings with - and _. \$\endgroup\$mheinzerling– mheinzerling2013年03月14日 11:00:55 +00:00Commented Mar 14, 2013 at 11:00
-
\$\begingroup\$ Thanks for the suggestion - if I can get it working then probably a better than a regex as far as performance is concerned. Will have a play and come back \$\endgroup\$SwiftD– SwiftD2013年03月15日 13:34:26 +00:00Commented Mar 15, 2013 at 13:34