A browser with JavaScript enabled is required for this page to operate properly.
Documentation

The Java™ Tutorials
Trail: Essential Java Classes
Lesson: Regular Expressions
« PreviousTrailNext »

The Java Tutorials have been written for JDK 8. Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.
See Dev.java for updated tutorials taking advantage of the latest releases.
See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.
See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.

Unicode Support

As of the JDK 7 release, Regular Expression pattern matching has expanded functionality to support Unicode 6.0.

Matching a Specific Code Point

You can match a specific Unicode code point using an escape sequence of the form \uFFFF, where FFFF is the hexadecimal value of the code point you want to match. For example, \u6771 matches the Han character for east.

Alternatively, you can specify a code point using Perl-style hex notation, \x{...}. For example:

String hexPattern = "\x{" + Integer.toHexString(codePoint) + "}";

Unicode Character Properties

Each Unicode character, in addition to its value, has certain attributes, or properties. You can match a single character belonging to a particular category with the expression \p{prop}. You can match a single character not belonging to a particular category with the expression \P{prop}.

The three supported property types are scripts, blocks, and a "general" category.

Scripts

To determine if a code point belongs to a specific script, you can either use the script keyword, or the sc short form, for example, \p{script=Hiragana}. Alternatively, you can prefix the script name with the string Is, such as \p{IsHiragana}.

Valid script names supported by Pattern are those accepted by UnicodeScript.forName.

Blocks

A block can be specified using the block keyword, or the blk short form, for example, \p{block=Mongolian}. Alternatively, you can prefix the block name with the string In, such as \p{InMongolian}.

Valid block names supported by Pattern are those accepted by UnicodeBlock.forName.

General Category

Categories can be specified with optional prefix Is. For example, IsL matches the category of Unicode letters. Categories can also be specified by using the general_category keyword, or the short form gc. For example, an uppercase letter can be matched using general_category=Lu or gc=Lu.

Supported categories are those of The Unicode Standard in the version specified by the Character class.

« PreviousTrailNext »

Previous page: Methods of the PatternSyntaxException Class
Next page: Additional Resources

AltStyle によって変換されたページ (->オリジナル) /