Unicode
- 100% developed as of Dec 31, 2012 Statements
- 100% developed as of Mar 10, 2013 Conditional blocks
- 100% developed as of Mar 10, 2013 Loop blocks
- 100% developed as of May 24, 2013 Boolean expressions
- 100% developed as of Feb 16, 2010 Variables
- 100% developed as of Mar 10, 2013 Primitive Types
- 100% developed as of Mar 10, 2013 Arithmetic expressions
- 100% developed as of May 24, 2013 Literals
- 100% developed as of Mar 10, 2013 Methods
- 100% developed as of May 24, 2013 String
- 100% developed as of Mar 10, 2013 Objects
- 100% developed as of Jul 5, 2012 Packages
- 100% developed as of Mar 10, 2013 Arrays
- 75% developed as of Jan 11, 2013 Mathematical functions
- 75% developed as of Jan 11, 2013 Large numbers
- 75% developed as of Jan 11, 2013 Random numbers
- 100% developed as of Apr 8, 2013 Unicode
- 100% developed as of Apr 8, 2013 Comments
- 100% developed as of Sep 27, 2007 Keywords
- 100% developed as of Aug 6, 2013 Coding conventions
- 0% developed as of Mar 26, 2018 Lambda expressions
Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, π (which is the Greek Lowercase Letter pi) is a valid Java identifier:
doubleπ=Math.PI;
and in a string literal:
Stringpi="π";
Unicode escape sequences
[edit | edit source ]Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).
Unicode escape sequences consist of
- a backslash '
\
' (ASCII character 92, hex 0x5c), - a '
u
' (ASCII 117, hex 0x75) - optionally one or more additional '
u
' characters, and - four hexadecimal digits (the characters '
0
' through '9
' or 'a
' through 'f
' or 'A
' through 'F
').
Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]
Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.
One can find a full list of the characters here.
π may also be represented in Java as the Unicode escape sequence \u03C0
. Thus, the following is a valid, but not very readable, declaration and assignment:
double\u03C0=Math.PI;
The following demonstrates the use of Unicode escape sequences in other Java syntax:
// Declare Strings pi and quote which contain \u03C0 and \u0027 respectively: Stringpi="\u03C0"; Stringquote="\u0027";
Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022
(double quote, ") needs to be quoted in a string just like ".
// Declare Strings doubleQuote1 and doubleQuote2 which both contain " (double quote): StringdoubleQuote1="\""; StringdoubleQuote2="\\u0022";// "\u0022" doesn't work since """ doesn't work.
International language support
[edit | edit source ]The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.
The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:
publicclass 哈嘍世界{ privateString文本="哈嘍世界"; }
References
[edit | edit source ]