I'm creating a function that converts a title, to a URL slug. I'm not all to familiar with regular expressions, but I've done my best. Can you see any problems or improvements with the following function below?
The only thing allowed in the slug is letters, numbers and -
charchaters.
function slugify($input) {
// Convert multiple spaces to single spaces
$slug = preg_replace("/[[:blank:]]+/",' ', $input);
// Convert to lower case
$slug = strtolower($slug);
// Remove anything that's not a number, letter or space
$slug = preg_replace('/[^a-z0-9\s]+/', '', $slug);
// Trim, and replace spaces with hyphens
$slug = preg_replace('/\s/', '-', trim($slug));
return $slug;
}
1 Answer 1
You should prepare a set of sentences to slugify and verify by yourself if your function is ok.
Below are the step I use to slugify text:
Use
iconv()
if available:$slug = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
Lowercase the text, taking Unicode into account:
$slug = mb_strtolower($slug);
Remove unwanted characters like you do:
$slug = preg_replace('/\W+/', '-', $slug);
These are the steps used in Propel ORM or symfony framework for example. The complete code can be:
function slugify($text, $separator = '-')
{
// transliterate
if (function_exists('iconv'))
{
$slug = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
// lowercase
if (function_exists('mb_strtolower'))
{
$slug = mb_strtolower($slug);
}
else
{
$slug = strtolower($slug);
}
// remove accents resulting from OSX's iconv
$slug = str_replace(array('\'', '`', '^'), '', $slug);
// replace non letter or digits with separator
$slug = preg_replace('/\W+/', $separator, $slug);
// trim
$slug = trim($slug, $separator);
return $slug;
}
I think $text
is a better name than $input
for the string to slugify.