The Unicode Blog: UAX #38

Showing posts with label UAX #38. Show all posts

Thursday, September 19, 2019

New Public Review Issues for Unicode Technical Reports

stopwatch image The Unicode Consortium has recently opened several Public Review Issues for proposed updates to Unicode Standard Annexes and other technical reports . The closing date for comments on these open issues is September 30, 2019, for feedback to be reviewed at the UTC meeting.

Highlights include a major proposed update to UTS #51, Unicode Emoji as well as significant updates to UAX #14, Unicode Line Breaking Algorithm, UTS #18, Unicode Regular Expressions, UAX #29, Unicode Text Segmentation, and UAX #38, Unicode Han Database.

Please see the Public Review Issues page for a full list of the items for review and links to the documents.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Posted by Unicode, Inc. at 2:05 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Public Review Issues, UAX #38, UTS #18, UTS #51

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

[画像:stopwatch image ]The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.

For more information about testing the 11.0 beta, see unicode.org/versions/beta-11.0.0.html
For the current draft summary of Unicode 11.0, see unicode.org/versions/Unicode11.0.0

In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm

Uses Extended_Pictographic property for future-proofing

UAX #29, Unicode Text Segmentation

New support for Indic virama handling
Uses Extended_Pictographic property for future-proofing
A new table of formal regex definitions

UAX #31, Unicode Identifier and Pattern Syntax

Refines the use of ZWJ in identifiers
Broadens the definition of hashtag identifiers

UAX #38, Unicode Han Database (Unihan)

Five new fields and improved regular expressions.
Document extension of Unihan properties to non-Unihan

UAX #44, Unicode Character Database

New property Equivalent_Unified_Ideograph
New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
More discussion of emoji variation sequences
Clarification of values allowed for the Age property

UTS #10, Unicode Collation Algorithm

Updates data to Unicode 11.0
Clarification of search tailoring in visual-order scripts

UTS #39, Unicode Security Mechanisms

Updates data to Unicode 11.0
Enhances discussions of joining controls & combining sequences

UTS #46, Unicode IDNA Compatibility Processing

Updates data to Unicode 11.0
Changes the format of the test file for arbitrary input settings
Updates input setting for Transitional_Processing

UTS #51, Unicode Emoji

Supplies Extended_Pictographic property for future-proofing
Simplifies emoji sequence definitions
EBNF and Regex expressions for loose matches
More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
Mechanism for changing the “facing” direction for emoji

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Posted by Unicode, Inc. at 12:59 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: beta, emoji, UAX #14, UAX #29, UAX #31, UAX #38, UAX #44, Unicode, UTS #10, UTS #39, UTS #46, UTS #51

Thursday, September 19, 2019

New Public Review Issues for Unicode Technical Reports

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog