10

I have the following gettext .po file, which has been translated from a .pot file. I am working on a Linux system (openSUSE if it matters), running gettext 0.17.

# 
# <[email protected]>, 2011
# transer <[email protected]>, 2011
msgid ""
msgstr ""
"Project-Id-Version: transtest\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2011年05月24日 22:47+0100\n"
"PO-Revision-Date: 2011年05月30日 23:03+0100\n"
"Last-Translator: \n"
"Language-Team: German (Germany)\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Language: de_DE\n"
"Plural-Forms: nplurals=2; plural=(n != 1)\n"
#: transtest.cpp:12
msgid "Min Size"
msgstr "Min Größe"

Now, when I create the .mo file via

msgfmt -c transtest_de_DE.po -o transtest.mo

I then check the encoding with the "file" command,

file --mime transtest_de_DE.po
transtest_de_DE.po: text/x-po; charset=utf-8

and then install it to my locale folder and run the program after exporting LANG and LC_CTYPE, I end up with garbage where the two non-ASCII chars are.

If I set my terminal encoding to ISO-8859-2, rather than UTF-8, then I see the two characters correctly.

Looking inside the generated .mo file with a text editor the file appears to be in UTF-8 as well (I can see the symbols if I set my editor encoding to UTF-8).

The program is very simple, and it looks like so:

#include <iostream>
#include <locale>
const char *PROGRAM_NAME="transtest";
using namespace std;
int main()
{
 setlocale (LC_ALL, "");
 bindtextdomain( PROGRAM_NAME, "/usr/share/locale" );
 textdomain( PROGRAM_NAME );
 cerr << gettext("Min Size") << endl;
}

I am installing the .mo file to /usr/share/locale/de_DE/LC_MESSAGES/transstest.mo, and I have exported LC_CTYPE and LANG as "de_DE".

$ echo $LC_CTYPE; echo $LANG
de_DE
de_DE

Where am I going wrong? Why is gettext giving me the wrong encoding (ISO-8859-2) for my strings, rather than the requested (in the .po file) UTF-8?

Edit:

The solution was in Stack Overflow question Can't make (UTF-8) traditional Chinese character to work in PHP gettext extension (.po and .mo files created in poEdit) and it appears that I needed to explicitly call

bind_textdomain_codeset(PROGRAM_NAME, "utf-8");

The final program looks like so:

#include <iostream>
#include <locale>
const char *PROGRAM_NAME="transtest";
using namespace std;
int main()
{
 setlocale (LC_ALL, "");
 bindtextdomain( PROGRAM_NAME, "/usr/share/locale" );
 bind_textdomain_codeset(PROGRAM_NAME, "utf-8");
 textdomain( PROGRAM_NAME );
 cerr << gettext("Min Size") << endl;
}

No changes to any of my gettext files were needed.

asked May 30, 2011 at 22:35
9
  • 1
    I'm mighty shaky on locales, but if you wanted UTF-8 strings, shouldn't you set your LANG=de_DE.utf8? Commented May 30, 2011 at 22:44
  • I just tried that, but it does not seem to make any difference, even if I alter the .mo install location. Anyway, I have specified it in the .po file, which I would have thought gave gettext all the info it needs. Commented May 30, 2011 at 22:50
  • Oh man. five hours later I find this post: stackoverflow.com/questions/2264740/…. Oh well, problem solved... sorry for the noise! Commented May 30, 2011 at 22:57
  • @CodeT, if that post actually provides the information you need to solve the problem, please summarize its contents in an answer here and accept it. :) I don't immediately see how that answer would be useful to you, so hopefully your answer can help others in the future. Commented May 30, 2011 at 22:59
  • 1
    @CodeT: You should be able to move your solution down to an answer now so we can get this off the unanswered list. Thank you. Commented May 31, 2011 at 13:10

1 Answer 1

5

If you have LC_CTYPE=de_DE (or LANG), programs are supposed to output ISO-8859-1 (note, 1, not 2), so if you have that and your terminal is set to utf-8, it's simply wrong. The correct locale for utf-8 is de_DE.utf-8.

Using bind_textdomain_codeset is wrong in your case. bind_textdomain_codeset is used if you want to work in fixed encoding internally, like e.g. GNOME does, but output should always be in what the locale specifies (obtained by calling nl_langinfo(CODESET), which is also what gettext does by default).

answered May 31, 2011 at 13:41
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.