I'm working on a webpage with language detection and I have the following script so far (it's simple). I still haven't done the user detection, so it's not available to find the user language (yet), but this will be easily implemented. Though I'm not asking for that, I'm asking for other ways to improve this code. So far as I've tested it, it's bug-free, but I want it to be bulletproof. How can I improve or expand it?
//LANGUAGES
//Language detection
if ( !empty($_POST['lang']) )
{
$Lang = $_POST['lang'];
$_SESSION['lang']= $_POST['lang'];
}
else
{
if ( !empty ($_SESSION['lang']))
$Lang = $_SESSION['lang'];
else
$Lang = substr ($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2);
}
//After this, $Lang will have the webpage preference
2 Answers 2
I would add an array with supported languages and check this with the post.
If there is no match change to some default language:
$languages = new stdClass();
$languages->default = 'en';
$languages->list = array('en', 'de', 'fr', 'es');
if (!empty($_POST['lang']) && in_array($_POST['lang'], $languages->list)) {
$Lang = $_POST['lang'];
$_SESSION['lang']= $_POST['lang'];
} else {
if (!empty($_SESSION['lang'])) {
$Lang = $_SESSION['lang'];
else
$Lang = get_browser_language($languages->list);
}
if (!in_array($Lang, $languages->list)) {
$Lang = $languages->default;
}
$_SESSION['lang'] = $Lang;
}
// I always use the following function to get the browser language (don't know anymore where I found it on the web)
function get_browser_language($available_languages,$http_accept_language='auto')
{
if ($http_accept_language == 'auto') $http_accept_language = $_SERVER['HTTP_ACCEPT_LANGUAGE'];
$pattern = '/([[:alpha:]]{1,8})(-([[:alpha:]|-]{1,8}))?(\s*;\s*q\s*=\s*(1\.0{0,3}|0\.\d{0,3}))?\s*(,|$)/i';
preg_match_all($pattern, $http_accept_language, $hits, PREG_SET_ORDER);
$bestlang = $available_languages[0];
$bestqval = 0;
foreach ($hits as $arr) {
$langprefix = strtolower ($arr[1]);
if (!empty($arr[3])) {
$langrange = strtolower ($arr[3]);
$language = $langprefix . "-" . $langrange;
}
else $language = $langprefix;
$qvalue = 1.0;
if (!empty($arr[5])) $qvalue = floatval($arr[5]);
// find q-maximal language
if (in_array($language,$available_languages) && ($qvalue > $bestqval)) {
$bestlang = $language;
$bestqval = $qvalue;
}
// if no direct hit, try the prefix only but decrease q-value by 10% (as http_negotiate_language does)
else if (in_array($langprefix,$available_languages) && (($qvalue*0.9) > $bestqval)) {
$bestlang = $langprefix;
$bestqval = $qvalue*0.9;
}
}
return $bestlang;
}
You could also add an extra check based on a translation table from ip to country by using for example the data of: http://www.ip2nation.com/
Or perhaps even use GEO location to get the info. Although this would not be my preferred way since users will / should get a warning stating that the site tries to use GEO location of the visitor. If I see that I never accept although it may just be me :)
I would also have created a cookie with the user's preferred language, so that when the user visits the site again at a later time he/she doesn't have to select the language again.
-
\$\begingroup\$ The same that I said in the last post (but I forgot to say it in the main question), it is checked later, and there's no problem, as if there's no string for the language detected, the string used is the default, "en" \$\endgroup\$Frank Presencia Fandos– Frank Presencia Fandos2011年09月10日 14:36:01 +00:00Commented Sep 10, 2011 at 14:36
-
\$\begingroup\$ @Frank Presencia Fandos: Why would you check it later? You do set the
session
var right here. \$\endgroup\$PeeHaa– PeeHaa2011年09月10日 14:44:32 +00:00Commented Sep 10, 2011 at 14:44 -
\$\begingroup\$ @session because (in the future) will be intended to be used at a user-maintained webpage, so each string will be independent, and if there's no string at French, it would be displayed at English (default), waiting for someone to correct it, while the rest of the page would be at French \$\endgroup\$Frank Presencia Fandos– Frank Presencia Fandos2011年09月10日 14:58:36 +00:00Commented Sep 10, 2011 at 14:58
On principle the code is sound except for one issue: you do not check if the language is "valid". This raises some questions:
- How will your localization code behave if I added a
lang
parameter with the value"foobar"
? - How will it behave if I try to do an SQL injection using the parameter value?
In both cases, the sane outcome would be to default to your "main" language, which would probably be hardcoded in your application.
Another issue you may want to consider is that nobody knows that there is an Accept-Language header and how to use it, so using that as a source of the user's preferred locale is not recommended:
It is not a good idea to use the HTTP Accept-Language header alone to determine the locale of the user. If you use Accept-Language exclusively, you may handcuff the user into a set of choices not to his liking.
In your shoes I would totally scrap using $_SERVER['HTTP_ACCEPT_LANGUAGE']
(IMHO it's useless in practice). If you want to go the extra mile and auto-detect the user's locale, use an IP database (I have used MaxMind GeoIP in the past myself) or the upcoming W3C geolocation spec.
-
\$\begingroup\$ I forgot to say that, if the language is not detected, later there's a little script that uses the "en" as default But this has a lot to do with the displaying options, asigning a language string to each of that why I totally forgot to say it... sorry! \$\endgroup\$Frank Presencia Fandos– Frank Presencia Fandos2011年09月10日 14:34:20 +00:00Commented Sep 10, 2011 at 14:34
-
\$\begingroup\$ About the SQL injection, I still haven't learned about that, I'm still learning the basics, but I know it'll be important. And about the HTTP Accept-Language header, it's still too early for me to use php geolocation, but I'll definitely keep it in mind. Thank you so much for the whole explanation. \$\endgroup\$Frank Presencia Fandos– Frank Presencia Fandos2011年09月10日 14:41:09 +00:00Commented Sep 10, 2011 at 14:41
-
\$\begingroup\$ @FrankPresenciaFandos: It's almost certain that there is no SQL injection or other type of vulnerability with your code (it would need to be very unusual to be vulnerable). But since we don't see exactly how
$Lang
ends up being used, I had to mention it. \$\endgroup\$Jon– Jon2011年09月10日 14:44:47 +00:00Commented Sep 10, 2011 at 14:44 -
1\$\begingroup\$ This is extremely bad advice. You should never rely on IP to choose language. If you pick any country in this world you will find different languages spoken there. Not to mention that GeoIP is only reliable in certain parts of the world.
Accept-Language
is the best you can get to actually detecting which language user prefers. \$\endgroup\$MeanEYE– MeanEYE2014年12月07日 16:25:51 +00:00Commented Dec 7, 2014 at 16:25
$_SERVER['HTTP_ACCEPT_LANGUAGE']
might not be set. \$\endgroup\$