Two implementations of website internationalization

Question 1

I have attempted to create an internationalization system for my PHP framework that I'm working on.

For the purpose I decided I will be using MySQL to store languages and translations. Respectively I have the tables

languages
id | name
----+--------
1 | English
2 | Spanish
translations
id | 1 | 2 | n ...
---+---------------+-------------+------
1 | Hello, $user | Hola, $user | ...
2 | Friend | Amigo | ...

I was told that this is not a normalized database structure. I think that this structure is easier to use, because if I were to add a column lang_id where I'd store the id of the language the string represents, then when I'm selecting I'd have to do

SELECT string FROM translations WHERE id = ? AND lang_id = {$lang_id}

Instead of what I currently use

SELECT {$lang_id} FROM translations WHERE id = ?

It looks simpler to me, also avoids the use of complex primary keys and uses less memory too. If I'm not correct please don't hesitate to criticize!

I have created two implementations of the logic class, one using eval in its base, other one using preg_replace_callback.

First implementation: Eval

class Translator {
 private $stmt = null;
 private static $instance;
 public static function getInstance(){
 if(!self::$instance){
 self::$instance = new self;
 }
 return self::$instance;
 }
 private function __construct() {
 $this->stmt = \X\Database::connection()->prepare('SELECT '.$_SESSION['lang_id'].' FROM translations WHERE id = ?')->getStmt();
 }
 public function get($id, $variables){
 $this->stmt->execute(array($id));
 foreach($variables as $k => $v){
 $$k = $v;
 }
 unset($id); unset($variables); unset($k); unset($v);
 return eval('return "'.addslashes($this->stmt->fetch(\PDO::FETCH_NUM)[0]).'";');
 }
}

Example usage in a view, assuming $t = \X\Translator::getInstance();

With variables

<h1><?= $t->get(1, array('user'=>'John Doe')) ?></h1>

Without variables

<h1><?= $t->get(2) ?></h1>

Second implementation: Regular Expressions

class Translator {
 public static function init(){
 ob_start();
 register_shutdown_function('\X\Translator::translateDocument');
 }
 public static function translateDocument(){
 $t = \X\Database::connection()->prepare('SELECT '.$_SESSION['lang_id'].' FROM translations WHERE id = ?')->getStmt();
 echo preg_replace_callback('/\{\{\s*(\d+)\s*(?:,(.+?))?\s*}}/', function($matches) use($t){
 $t->execute(array($matches[1]));
 if(count($matches) > 2){
 $keys = array();
 $vals = array();
 preg_replace_callback('/([^\s]+):\s*([^\;]+);/', function($matchesInner) use(&$keys, &$vals){
 $keys[] = $matchesInner[1];
 $vals[] = $matchesInner[2];
 }, $matches[2]);
 return str_replace($keys,$vals,$t->fetch(\PDO::FETCH_NUM)[0]);
 }else{
 return $t->fetch(\PDO::FETCH_NUM)[0];
 }
 }, ob_get_clean());
 }
}

Example usage, assuming that \X\Translator::init() has been called

With variables

<h1>{{ 1, $user: John Doe; }}</h1>

Without variables

<h1>{{ 2 }}</h1>

Finally

I ran a benchmark consisting of 5 runs of 1000 translations each, showing the statistics

#Average execution time
 eval: 0.65 sec
 regex: 1.15 sec
#Memory used
 eval: 240 kb
 regex: 253 kb

Question 2

about the database: it will be a mess when it will have 20 languages.

Question 3

@MarcoAcierno It will have a lot of columns, why is that considered a mess? Otherwise it will have 20 times the rows it has this way :?

Question 4

So you like this and this?

Question 5

Why don't you just use gettext? Internationalization is a very hard problem, it's not just about replacing texts, you also need to care about grammar, some languages changes cases when the subject is male or female, localizing dates are just horrible, currency only a bit better, and there is a whole lot of subtleties. Why reinvent the wheel?

Question 6

And i would avoid to have PHP code, this $user looks so bad.

Question 7

Creating the translations table with an indefinite number of weirdly named columns is a bad idea. Subsequently adding a language will be awkward: it involves ALTERing the table, then a series of UPDATEs.

I suggest a table with normalization...

CREATE TABLE translations (
 id INTEGER NOT NULL,
 lang_id INTEGER NOT NULL,
 string VARCHAR(???),
 PRIMARY KEY (id, lang_id),
 FOREIGN KEY (lang_id) REFERENCES languages (id)
);

Then, for your programming convenience, create a view for each language that you support.

CREATE OR REPLACE VIEW en AS
 SELECT string
 FROM translations AS t
 JOIN languages AS l
 ON t.lang_id = l.id
 WHERE l.name = 'English';

Alternatively, define a stored procedure to satisfy your laziness.

That way, your schema stays normalized. Adding a view is less risky than altering a table. Even your queries read better:

SELECT string FROM en WHERE id = ?;

However, for performance, I think you would be better off querying and caching all of the strings (probably a few hundred — no big deal) in PHP than issuing a separate query per string.

Question 8

What do you mean less risky than altering a table. What are the risks?

Question 9

@php_nub_qq DBAs have a general aversion to schema changes. ALTER TABLE can be mildly problematic — for example, in MySQL, it breaks transactions; DDL changes cannot be rolled back. ALTER TABLE requires more privileges than INSERT. ALTER TABLE takes a lock, which would be bad if the table were huge. Mainly, I make this recommendation because supporting a new languages should be a conceptually additive operation, and therefore should be better done in a way that doesn't involve a schema change.

user50399 user50399 1,3376 silver badges6 bronze badges · Answer 1 · 2014-08-21 16:08:10Z

Creating the translations table with an indefinite number of weirdly named columns is a bad idea. Subsequently adding a language will be awkward: it involves ALTERing the table, then a series of UPDATEs.

I suggest a table with normalization...

CREATE TABLE translations (
 id INTEGER NOT NULL,
 lang_id INTEGER NOT NULL,
 string VARCHAR(???),
 PRIMARY KEY (id, lang_id),
 FOREIGN KEY (lang_id) REFERENCES languages (id)
);

Then, for your programming convenience, create a view for each language that you support.

CREATE OR REPLACE VIEW en AS
 SELECT string
 FROM translations AS t
 JOIN languages AS l
 ON t.lang_id = l.id
 WHERE l.name = 'English';

Alternatively, define a stored procedure to satisfy your laziness.

That way, your schema stays normalized. Adding a view is less risky than altering a table. Even your queries read better:

SELECT string FROM en WHERE id = ?;

However, for performance, I think you would be better off querying and caching all of the strings (probably a few hundred — no big deal) in PHP than issuing a separate query per string.

What do you mean less risky than altering a table. What are the risks?
@php_nub_qq DBAs have a general aversion to schema changes. ALTER TABLE can be mildly problematic — for example, in MySQL, it breaks transactions; DDL changes cannot be rolled back. ALTER TABLE requires more privileges than INSERT. ALTER TABLE takes a lock, which would be bad if the table were huge. Mainly, I make this recommendation because supporting a new languages should be a conceptually additive operation, and therefore should be better done in a way that doesn't involve a schema change.

Stack Exchange Network

Two implementations of website internationalization

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Two implementations of website internationalization

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions