I think this is the simplest way to slugify urls. You have any contra-indication?
function url_clean($str){
$str = iconv('utf-8', 'us-ascii//TRANSLIT', $str);
$clean_str = preg_replace(array('/\'|\"/','/ /'),array('','-'),$str);
return $clean_str;
}
2 Answers 2
An alternative, simpler way to code your solution is to use the strtr
function which "translates characters". Also I made sure to escape the special characters in the regex.
function url_clean($str) {
$accent = array(' űáéúőóüöíŰÁÉÚŐÓÜÖÍ');
$clean = array('-uaeuoouoiUAEUOOUOI');
$str = strtr($str, $accent, $clean);
return preg_replace('/[^A-Za-z0-9\-\.]/', '', $str);
}
There are two issues with your otherwise elegant approach:
iconv
silently cuts the string if a disallowed UTF-8 character is present. The solution would be to add//IGNORE
to theiconv()
call but 1/ a bug in glibc seems to prevent this 2/ PHP developers don't seem to want to implement a work-around. An option is to remove invalid characters yourself:ini_set('mbstring.substitute_character', "none"); $text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');
You're not removing all characters that are present in ASCII but disallowed in a URL: see this StackOverflow answer.