Unicode

Random numbers Java Programming
Unicode Comments

Navigate Language Fundamentals topic: (

v
t
e

)

Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:

Example Code section 3.100: Pi.

doubleπ=Math.PI;

and in a string literal:

Example Code section 3.101: Pi literal.

Stringpi="π";

Unicode escape sequences

[edit | edit source ]

Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).

Unicode escape sequences consist of

a backslash '\' (ASCII character 92, hex 0x5c),
a 'u' (ASCII 117, hex 0x75)
optionally one or more additional 'u' characters, and
four hexadecimal digits (the characters '0' through '9' or 'a' through 'f' or 'A' through 'F').

Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.^[1]

Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.

One can find a full list of the characters here.

π may also be represented in Java as the Unicode escape sequence \u03C0. Thus, the following is a valid, but not very readable, declaration and assignment:

Example Code section 3.102: Unicode escape sequences for Pi.

double\u03C0=Math.PI;

The following demonstrates the use of Unicode escape sequences in other Java syntax:

Example Code section 3.103: Unicode escape sequences in a string literal.

// Declare Strings pi and quote which contain \u03C0 and \u0027 respectively:
Stringpi="\u03C0";
Stringquote="\u0027";

Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022 (double quote, ") needs to be quoted in a string just like ".

Example Code section 3.104: Double quote.

// Declare Strings doubleQuote1 and doubleQuote2 which both contain " (double quote):
StringdoubleQuote1="\"";
StringdoubleQuote2="\\u0022";// "\u0022" doesn't work since """ doesn't work.

International language support

[edit | edit source ]

The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.

The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:

Computer code Code listing 3.50: 哈嘍世界.java

publicclass 哈嘍世界{
privateString文本="哈嘍世界";
}

References

[edit | edit source ]

↑ "3.1 Unicode", The JavaTM Language Specification [1], Java SE 7 Edition, pp. 15-16.

Random numbers Java Programming
Unicode Comments

Retrieved from "https://en.wikibooks.org/w/index.php?title=Java_Programming/Unicode&oldid=3659620"