4.3 Character Handling
-<
ANSI C Rationale
->
4.5 Mathematics
Index
C has become an international language.
Users of the language outside the United States have been forced to
deal with the various Americanisms built into the standard library
routines.
Areas affected by international considerations include:
- Alphabet.
-
The English language uses 26 letters derived from the Latin alphabet.
This set of letters suffices for English, Swahili, and Hawaiian;
all other living languages use either the Latin alphabet plus
other characters, or other, non-Latin alphabets or syllabaries.
In English, each letter has an upper-case and lower-case form.
The German ``sharp S'', ß, occurs only in lower-case.
European French usually omits diacriticals on upper-case letters.
Some languages do not have the concept of two cases.
- Collation.
-
In both EBCDIC and ASCII the code for `z' is greater
than the code for `a', and so on for other letters in the alphabet,
so a ``machine sort'' gives not unreasonable results for ordering
strings.
In contrast,
most European languages use a codeset resembling ASCII
in which some of the codes used in ASCII for punctuation characters are
used for alphabetic characters. (See §2.2.1.)
The ordering of these codes is not alphabetic.
In some languages letters with diacritics sort as separate letters;
in others they should be collated just as the unmarked form.
In Spanish, ``ll'' sorts as a single letter following ``l'';
in German, ``ß'' sorts like ``ss''.
- Formatting of numbers and currency amounts.
-
In the United States the period is invariably used for the decimal point;
this usage was built into the definitions of such functions as
printf and scanf.
Prevalent practice in several major European countries is to use a comma;
a raised dot is employed in some locales.
Similarly, in the United States a comma is used to separate
groups of three digits to the left of the decimal point;
a period is common in Europe, and in some countries digits are
not grouped by threes. In printing currency amounts, the
currency symbol (which may be more than one character) may
precede, follow, or be embedded in the digits.
- Date and time.
-
The standard function
asctime returns a string which includes
abbreviations for month and weekday names,
and returns the various elements in a format which might be
considered unusual even in its country of origin.
Various common date formats include
1776年07月04日 ISO Format
4.7.76 customary central
European and British usage
7/4/76 customary U.S. usage
4.VII.76 Italian usage
76186 Julian date (YYDDD)
04JUL76 airline usage
Thursday, July 4, 1776 full U.S. format
Donnerstag, 4. Juli 1776 full German format
Time formats are also quite diverse:
3:30 PM customary U.S. and British format
1530 U.S. military format
15h.30 Italian usage
15.30 German usage
15:30 common European usage
The Committee has introduced mechanisms into the C library to allow
these and other issues to be treated in the appropriate
locale-specific manner.
The localization features of the Standard are based on these
principles:
- English for C source.
-
The C language proper is based on English.
Keywords are based on English words.
A program which uses ``national characters'' in identifiers
is not strictly conforming.
(Use of national characters in comments is strictly conforming,
though what happens when such a program is printed in a different
locale is unspecified.)
The decimal point must be a period in C source,
and no thousands delimiter may be used.
- Runtime selectability.
-
The locale must be selectable at runtime,
from an implementation-defined set of possibilities.
Translate-time selection does not offer sufficient flexibility.
Software vendors do not want to supply different object forms
of their programs in different locales.
Users do not want to use different versions of a program just
because they deal with several different locales.
- Function interface.
-
Locale is changed by calling a function,
thus allowing the implementation to recognize the change,
rather than by, say, changing a memory location that contains
the decimal point character.
- Immediate effect.
-
When a new locale is selected, affected functions reflect the
change immediately.
(This is not meant to imply if a signal-handling function were to
change the selected locale and return to a library function,
that the return value from that library function must be completely
correct with respect to the new locale.)
4.4.1 Locale control
setlocale provides the mechanism for controlling
locale-specific features of the library.
The category argument allows parts of the library to be localized
as necessary without changing the entire locale-specific environment.
Specifying the locale argument as a string
gives an implementation maximum flexibility in providing a set of locales.
For instance, an implementation could map the argument string into
the name of a file containing appropriate localization parameters
--- these files could then be added and
modified without requiring any recompilation of a localizable
program.
4.4.2 Numeric formatting convention inquiry
The localeconv function gives a programmer access to information
about how to format numeric quantities (monetary or otherwise).
This sort of interface was considered preferable to defining conversion
functions directly:
even with a specified locale, the set of distinct formats that can
be constructed from these elements is large, and the ones desired
very application-dependent.
4.3 Character Handling
-<
ANSI C Rationale
->
4.5 Mathematics
Index