Regular Expression Unicode Property Set Reference

Property sets are a kind of Unicode property that flavors may support. A property set can have multiple values. Every Unicode code point is assigned exactly one value of each set. In a regular expression you specify both a set and one of its values as the Property in \p{Property}. If the main Unicode reference page indicates that your flavor supports negated property syntax then all properties listed below as being supported can also be used with the negated syntax. The negated property then matches any code point that has any value of the specified set other than the specified value.

Many flavors only support a limited number of property sets. The fact that a flavor is built on a certain version of Unicode does not mean it supports all the property sets that exist in that version of Unicode. The table below indicates which flavors support which properties sets. If a flavor supports a property set then it does support all property values that are part of that set. Those values are listed in the description. The exact code points matched by each property do depend on the Unicode version the flavor was built with.

Strictly speaking, General_Category, Script, and Block are also property sets. But we handle those separately in this reference because they have alternative syntax specific to them. The lists of scripts and blocks also keep growing with new Unicode versions.

FeatureSyntaxDescriptionExampleJGsoft Python JavaScript VBScript XRegExp .NET Java ICU RE2 Perl PCRE PCRE2 PHP Delphi R Ruby std::regex Boost Tcl POSIX GNU Oracle XML XPath
Unicode property set \p{Set_Name=Value} Matches a single Unicode code point that has the specified value in the specified set. nononononononoYESnoYESno10.408.2.0no4.2.21.9nononononononono
Unicode property set \p{Set_Name:Value} Matches a single Unicode code point that has the specified value in the specified set. nononononononononoYESno10.408.2.0no4.2.2nonononononononono
Unicode property set \p{IsSet_Name=Value} Matches a single Unicode code point that has the specified value in the specified set. nononononononononoYESnononononononononononononono
Unicode property set \p{IsSet_Name:Value} Matches a single Unicode code point that has the specified value in the specified set. nononononononononoYESnononononononononononononono
Property set name Age V1_1, V2_0, V2_1, V3_0, V3_1, V3_2, V4_0, V4_1, V5_0, V5_1, V5_2, V6_0, V6_1, V6_2, V6_3, V7_0, V8_0, V9_0, V10_0, V11_0, V12_0, V12_1, V13_0, V14_0, V15_0, V15_1, V16_0 \p{Age=V2_1} matches but not A n/an/an/an/an/an/an/aYESn/aYESn/anonon/ano1.9n/an/an/an/an/an/an/an/a
Short property set name age V1_1, V2_0, V2_1, V3_0, V3_1, V3_2, V4_0, V4_1, V5_0, V5_1, V5_2, V6_0, V6_1, V6_2, V6_3, V7_0, V8_0, V9_0, V10_0, V11_0, V12_0, V12_1, V13_0, V14_0, V15_0, V15_1, V16_0 \p{Age=V2_1} matches but not A n/an/an/an/an/an/an/aYESn/aYESn/anonon/ano1.9n/an/an/an/an/an/an/an/a
Property set name Present_In V1_1, V2_0, V2_1, V3_0, V3_1, V3_2, V4_0, V4_1, V5_0, V5_1, V5_2, V6_0, V6_1, V6_2, V6_3, V7_0, V8_0, V9_0, V10_0, V11_0, V12_0, V12_1, V13_0, V14_0, V15_0, V15_1, V16_0 \p{Present_In=V2_1} matches and A n/an/an/an/an/an/an/anon/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name In V1_1, V2_0, V2_1, V3_0, V3_1, V3_2, V4_0, V4_1, V5_0, V5_1, V5_2, V6_0, V6_1, V6_2, V6_3, V7_0, V8_0, V9_0, V10_0, V11_0, V12_0, V12_1, V13_0, V14_0, V15_0, V15_1, V16_0 \p{In=V2_1} matches and A n/an/an/an/an/an/an/anon/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Canonical_Combining_Class Not_Reordered, Overlay, Han_Reading, Nukta, Kana_Voicing, Virama, CCC10, CCC11, CCC12, CCC13, CCC14, CCC15, CCC16, CCC17, CCC18, CCC19, CCC20, CCC21, CCC22, CCC23, CCC24, CCC25, CCC26, CCC27, CCC28, CCC29, CCC30, CCC31, CCC32, CCC33, CCC34, CCC35, CCC36, CCC84, CCC91, CCC103, CCC107, CCC118, CCC122, CCC129, CCC130, CCC132, CCC133, Attached_Below_Left, Attached_Below, Attached_Above, Attached_Above_Right, Below_Left, Below, Below_Right, Left, Right, Above_Left, Above, Above_Right, Double_Below, Double_Above, Iota_Subscript n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name ccc NR, OV, HANR, NK, KV, VR, CCC10, CCC11, CCC12, CCC13, CCC14, CCC15, CCC16, CCC17, CCC18, CCC19, CCC20, CCC21, CCC22, CCC23, CCC24, CCC25, CCC26, CCC27, CCC28, CCC29, CCC30, CCC31, CCC32, CCC33, CCC34, CCC35, CCC36, CCC84, CCC91, CCC103, CCC107, CCC118, CCC122, CCC129, CCC130, CCC132, CCC133, ATBL, ATB, ATA, ATAR, BL, B, BR, L, R, AL, A, AR, DB, DA, IS n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Numeric_Type Decimal, Digit, Numeric n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name nt De, Di, Nu n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Numeric_Value \p{Numeric_Value=1} matches 1 and but not 2 n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name nv \p{nv=1} matches 1 and but not 2 n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Line_Break Ambiguous, Alphabetic, Break_Both, Break_After, Break_Before, Mandatory_Break, Contingent_Break, Conditional_Japanese_Starter, Close_Punctuation, Combining_Mark, Close_Parenthesis, Carriage_Return, E_Base, E_Modifier, Exclamation, Glue, H2, H3, Hebrew_Letter, Hyphen, Ideographic, Inseparable, Infix_Numeric, JL, JT, JV, Line_Feed, Next_Line, Nonstarter, Numeric, Open_Punctuation, Postfix_Numeric, Prefix_Numeric, Quotation, Regional_Indicator, Complex_Context, Surrogate, Space, Break_Symbols, Word_Joiner, Unknown, ZWSpace, ZWJ, Aksara, Aksara_Prebase, Aksara_Start, Virama, Virama_Final n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name lb AI, AL, B2, BA, BB, BK, CB, CJ, CL, CM, CP, CR, EB, EM, EX, GL, H2, H3, HL, HY, ID, IN, IS, JL, JT, JV, LF, NL, NS, NU, OP, PO, PR, QU, RI, SA, SG, SP, SY, WJ, XX, ZW, ZWJ, AK, AP, AS, VI, VF n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Grapheme_Cluster_Break Other, CR, LF, Control, Extend, ZWJ, Regional_Indicator, Prepend, SpacingMark, L, LV, LVT, T, V, E_Base, E_Base_GAZ, E_Modifier, Glue_After_Zwj n/an/an/an/an/an/an/aYESn/aYESn/anonon/ano2.4n/an/an/an/an/an/an/an/a
Short property set name GCB XX, CR, LF, CN, EX, ZWJ, RI, PP, SM, L, LV, LVT, T, V, EB, EBG, EM, GAZ n/an/an/an/an/an/an/aYESn/aYESn/anonon/ano2.4n/an/an/an/an/an/an/an/a
Property set name Hangul_Syllable_Type Not_Applicable, Leading_Jamo, LV_Syllable, LVT_Syllable, Trailing_Jamo, Vowel_Jamo n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name hst NA, L, LV, LVT, T, V n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Word_Break Other, CR, LF, Newline, Extend, ZWJ, Regional_Indicator, Format, Katakana, Hebrew_Letter, ALetter, Single_Quote, Double_Quote, MidNumLet, MidLetter, MidNum, Numeric, ExtendNumLet, WSegSpace, E_Base, E_Base_GAZ, E_Modifier, Glue_After_Zwj n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name WB XX, CR, LF, NL, Extend, ZWJ, RI, FO, KA, HL, LE, SQ, DQ, MB, ML, MN, NU, EX, WSegSpace, EB, EBG, EM, GAZ n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Sentence_Break Other, CR, LF, Extend, Sep, Format, Sp, Lower, Upper, OLetter, Numeric, ATerm, SContinue, STerm, Close n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name SB XX, CR, LF, EX, SE, FO, SP, LO, UP, LE, NU, AT, SC, ST, CL n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Bidi_Class Left_To_Right, Right_To_Left, Arabic_Letter, European_Number, European_Separator, European_Terminator, Arabic_Number, Common_Separator, Nonspacing_Mark, Boundary_Neutral, Paragraph_Separator, Segment_Separator, White_Space, Other_Neutral, Left_To_Right_Embedding, Left_To_Right_Override, Right_To_Left_Embedding, Right_To_Left_Override, Pop_Directional_Format, Left_To_Right_Isolate, Right_To_Left_Isolate, First_Strong_Isolate, Pop_Directional_Isolate n/an/an/an/an/an/an/aYESn/aYESn/a10.408.2.0n/a4.2.2non/an/an/an/an/an/an/an/a
Short property set name bc L, R, AL, EN, ES, ET, AN, CS, NSM, BN, B, S, WS, ON, LRE, LRO, RLE, RLO, PDF, LRI, RLI, FSI, PDI n/an/an/an/an/an/an/aYESn/aYESn/a10.408.2.0n/a4.2.2non/an/an/an/an/an/an/an/a
Property set name Bidi_Paired_Bracket_Type None, Open, Close n/an/an/an/an/an/an/aYESn/a5.20n/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name bpt n, o, c n/an/an/an/an/an/an/aYESn/a5.20n/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Decomposition_Type None, Canonical, Compat, Circle, Final, Font, Fraction, Initial, Isolated, Medial, Narrow, Nobreak, Small, Square, Sub, Super, Vertical, Wide n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name dt None, Can, Com, Enc, Fin, Font, Fra, Init, Iso, Med, Nar, Nb, Sml, Sqr, Sub, Sup, Vert, Wide n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name East_Asian_Width Neutral, Narrow, Halfwidth, Ambiguous, Fullwidth, Wide n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name ea N, Na, H, A, F, W n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Indic_Conjunct_Break None, Consonant, Extend, Linker n/an/an/an/an/an/an/a76n/anon/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name InCB (same as above) n/an/an/an/an/an/an/a76n/anon/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Indic_Positional_Category NA, Bottom, Bottom_And_Left, Bottom_And_Right, Left, Left_And_Right, Overstruck, Right, Top, Top_And_Bottom, Top_And_Bottom_And_Left, Top_And_Bottom_And_Right, Top_And_Left, Top_And_Left_And_Right, Top_And_Right, Visual_Order_Left n/an/an/an/an/an/an/a63n/a5.24n/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name InPC (same as above) n/an/an/an/an/an/an/a63n/a5.24n/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Indic_Syllabic_Category Other, Avagraha, Bindu, Brahmi_Joining_Number, Cantillation_Mark, Consonant, Consonant_Dead, Consonant_Final, Consonant_Head_Letter, Consonant_Initial_Postfixed, Consonant_Killer, Consonant_Medial, Consonant_Placeholder, Consonant_Preceding_Repha, Consonant_Prefixed, Consonant_Subjoined, Consonant_Succeeding_Repha, Consonant_With_Stacker, Gemination_Mark, Invisible_Stacker, Joiner, Modifying_Letter, Non_Joiner, Nukta, Number, Number_Joiner, Pure_Killer, Register_Shifter, Reordering_Killer, Syllable_Modifier, Tone_Letter, Tone_Mark, Virama, Visarga, Vowel, Vowel_Dependent, Vowel_Independent n/an/an/an/an/an/an/a63n/a5.24n/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name InSC (same as above) n/an/an/an/an/an/an/a63n/a5.24n/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Joining_Group No_Joining_Group, African_Feh, African_Noon, African_Qaf, Ain, Alaph, Alef, Beh, Beth, Burushaski_Yeh_Barree, Dal, Dalath_Rish, E, Farsi_Yeh, Fe, Feh, Final_Semkath, Gaf, Gamal, Hah, Hanifi_Rohingya_Kinna_Ya, Hanifi_Rohingya_Pa, He, Heh, Heh_Goal, Heth, Kaf, Kaph, Kashmiri_Yeh, Khaph, Knotted_Heh, Lam, Lamadh, Malayalam_Bha, Malayalam_Ja, Malayalam_Lla, Malayalam_Llla, Malayalam_Nga, Malayalam_Nna, Malayalam_Nnna, Malayalam_Nya, Malayalam_Ra, Malayalam_Ssa, Malayalam_Tta, Manichaean_Aleph, Manichaean_Ayin, Manichaean_Beth, Manichaean_Daleth, Manichaean_Dhamedh, Manichaean_Five, Manichaean_Gimel, Manichaean_Heth, Manichaean_Hundred, Manichaean_Kaph, Manichaean_Lamedh, Manichaean_Mem, Manichaean_Nun, Manichaean_One, Manichaean_Pe, Manichaean_Qoph, Manichaean_Resh, Manichaean_Sadhe, Manichaean_Samekh, Manichaean_Taw, Manichaean_Ten, Manichaean_Teth, Manichaean_Thamedh, Manichaean_Twenty, Manichaean_Waw, Manichaean_Yodh, Manichaean_Zayin, Meem, Mim, Noon, Nun, Nya, Pe, Qaf, Qaph, Reh, Reversed_Pe, Rohingya_Yeh, Sad, Sadhe, Seen, Semkath, Shin, Straight_Waw, Swash_Kaf, Syriac_Waw, Tah, Taw, Teh_Marbuta, Teh_Marbuta_Goal, Hamza_On_Heh_Goal, Teth, Thin_Yeh, Vertical_Tail, Waw, Yeh, Yeh_Barree, Yeh_With_Tail, Yudh, Yudh_He, Zain, Zhain n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name jg (same as above) n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Joining_Type Non_Joining, Transparent, Join_Causing, Left_Joining, Right_Joining, Dual_Joining n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name jt U, T, C, L, R, D n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Vertical_Orientation Rotated, Upright, Transformed_Rotated, Transformed_Upright n/an/an/an/an/an/an/a63n/a5.28n/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name vo R, U, Tr, Tu n/an/an/an/an/an/an/a63n/a5.28n/anonon/anonon/an/an/an/an/an/an/an/a
Property set name NFC_Quick_Check Yes, Maybe, No n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name NFC_QC Y, M, N n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name NFKC_Quick_Check Yes, Maybe, No n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name NFKC_QC Y, M, N n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name NFD_Quick_Check Yes, No n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name NFD_QC Y, N n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name NFKD_Quick_Check Yes, No n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Short property set name NFKD_QC Y, N n/an/an/an/an/an/an/aYESn/aYESn/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Identifier_Status Restricted, Allowed n/an/an/an/an/an/an/a75n/a5.32n/anonon/anonon/an/an/an/an/an/an/an/a
Property set name Identifier_Type Not_Character, Deprecated, Default_Ignorable, Not_NFKC, Not_XID, Exclusion, Obsolete, Technical, Uncommon_Use, Limited_Use, Inclusion, Recommended n/an/an/an/an/an/an/a75n/a5.32n/anonon/anonon/an/an/an/an/an/an/an/a
FeatureSyntaxDescriptionExampleJGsoft Python JavaScript VBScript XRegExp .NET Java ICU RE2 Perl PCRE PCRE2 PHP Delphi R Ruby std::regex Boost Tcl POSIX GNU Oracle XML XPath

AltStyle によって変換されたページ (->オリジナル) /