Friday, October 25, 2024
ICU 76 Released
ICU 76 updates to Unicode 16 (blog), including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations. It also updates to CLDR 46 (beta blog) locale data with new locales, significant updates to existing locales, and various additions and corrections. For example, the CLDR and Unicode default sort orders are now very nearly the same.
Most of the java.time (Temporal) types can now be formatted directly using the existing ICU4J date/time formatting classes.
There are some new APIs to make ICU easier to use with modern C++ and Java patterns. Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs, and usable on top of binary stable C APIs, which is a first for ICU.
The Java and C++ technology preview implementations of the (also in tech preview) CLDR MessageFormat 2.0 specification have been updated to match recent changes.
ICU 76 and CLDR 46 are major releases, including a new version of Unicode and major locale data improvements.
For details, please see
https://unicode-org.github.io/icu/download/76.html.
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
Unicode CLDR 46 available
- Updated to Unicode 16.0 (including major changes to collation)
- Substantial additions and modifications of Emoji search keyword data
- ‘Upleveling’ the locale coverage (see below)
- Updates to Message Format in tech preview
- Updates to conformance
- New tech preview section on semantic skeletons
New / Upleveled Locales
±
New Level
Locales
📈
Modern
Nigerian Pidgin, Tigrinya
📈
Moderate
Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof
📈
Basic
Ewe, Ga, Kinyarwanda, Konkani (Latin), Northern Sotho, Oromo, Sichuan Yi, Southern Sotho, Tswana
📉
Basic*
Chuvash, Anii
For more information
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
Monday, May 20, 2024
Unicode CLDR Version 46 Submission Open
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
Version 46 is focusing on:
- Unicode 16 additions: new emoji, script names, collation data (Chinese & Japanese), …
- Emoji search keywords: Expanding keyword coverage to make it easier for users to find the right emoji
- New Languages targeting Basic:
- Ewe (ee),
- Ga (gaa)
- Kinyarwanda (rw)
- Northern Sotho (nso)
- Oromo (om),
- Sesotho (st)
- Setswana (tn),
- Up-leveling: Akan (ak)
Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.
Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.
If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Thursday, April 18, 2024
Unicode CLDR v45 released
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)
CLDR 45 did not have a Survey Tool submission phase, and focused on tooling and just a few functional areas:
MessageFormat 2.0 Tech Preview
Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. The goal for MessageFormat 2.0 is to allow developers and translators to create natural-sounding, grammatically-correct, user interfaces that can appear in any language and support the needs of various cultures.The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats), and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides the means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.
See also:
-
UTW { } MessageFormat v2 (November 7, 2023)
- Message Format Virtual Open House (February 20, 2024)
Keyboard 3.0 stable version
Keyboard support for digitally disadvantaged languages (DDLs) is often lacking or inconsistent between platforms. The updated LDML Keyboard 3.0 format specifies an interchange format for keyboard data. This will allow keyboard authors to create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. This format allows both physical and virtual (that is, on-screen or touch) keyboard layouts for a language to be defined in a single file.See also:
-
CLDR, Beyond Locale Data (June 22, 2023)
Tooling changes
Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in v45 provide faster turn-around for linguists and higher data quality. They are targeted at the v46 submission period, starting in May, 2024.Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Wednesday, April 17, 2024
ICU 75 Released
The CLDR MessageFormat 2.0 specification is now in technology preview, together with a corresponding update of the ICU4J (Java) tech preview and a new ICU4C (C++) tech preview.
For details, please see https://icu.unicode.org/download/75.
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Tuesday, March 5, 2024
Unicode CLDR v45 Alpha available for testing
The Unicode CLDR v45 Alpha is now available for integration testing.
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR? )
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets .
CLDR 45 is a closed release with no submission period, focusing on just a few areas:
MessageFormat 2.0 Tech Preview
Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. The goal for MessageFormat 2.0 is to allow developers and translators to create natural-sounding, grammatically-correct, user interfaces that can appear in any language and support the needs of diverse cultures.
The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats), and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides a means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.
See also:
UTW {�} MessageFormat v2 (November 7, 2023)
Message Format Virtual Open House (February 20, 2024)
Keyboard 3.0 stable version
Keyboard support for digitally disadvantaged languages is often lacking or inconsistent between platforms. The updated LDML Keyboard 3.0 format specifies an interchange format for keyboard data. This will allow keyboard authors to create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. This format allows both physical and virtual (that is, on-screen or touch) keyboard layouts for a language to be defined in a single file.
See also:
CLDR, Beyond Locale Data (June 22, 2023)
Tooling changes
Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in v45 provide faster turn-around for linguists and higher data quality. They are targeted at the v46 submission period, starting in May, 2024.
For more information
See the draft CLDR v45 release page , which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Tuesday, October 31, 2023
ICU 74 Released
ICU 74 and CLDR 44 are major releases, including a new version of Unicode and major locale data improvements. They subsume the changes for the ICU 73.2 and CLDR 43.1 maintenance releases.
Unicode 15.1 adds source code security mechanisms, improves line breaking for southeast Asian scripts, and adds important CJK unified ideographs.
CLDR 44 has added or improved data for a number of languages that have been newly added to ICU, and has improved measurement unit handling, conversion, and formatting.
ICU 74 implements these improvements, adds new C APIs for locale handling, adds a plug-in API for word segmentation, and switches the Java build system to Maven.
For details, please see https://icu.unicode.org/download/74.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Unicode CLDR v44 available
- Formatting Person Names. Added further enhancements (data and structure) for formatting people's names. For more information on why this feature is being added and what it does, see Background.
- Emoji 15.1 Support. Added short names, keywords, and sort-order for the new Unicode 15.1 emoji.
- Unicode 15.1 additions. Made the regular additions and changes for a new release of Unicode, including names for new scripts, collation data for Han characters, etc.
- Digitally disadvantaged language coverage.
Work began to improve DDL coverage, with the following DDL locales now
having higher coverage levels:
- Modern: Cherokee, Lower Sorbian, Upper Sorbian
- Moderate: Anii, Interlingua, Kurdish, Māori, Venetian
- Basic: Esperanto, Interlingue, Kangri, Kuvi, Kuvi (Devanagari), Kuvi (Odia), Kuvi (Telugu), Ligurian, Lombard, Low German, Luxembourgish, Makhuwa, Maltese, N’Ko, Occitan, Prussian, Silesian, Swampy Cree, Syriac, Toki Pona, Uyghur, Western Frisian, Yakut, Zhuang
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
There are many other changes: to find out more, see the CLDR v44 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.
In version 44, the following levels were reached:
Langs
Usage
95
Suitable for full UI internationalization
13
Suitable for “document content” internationalization, eg. in spreadsheet
50
Suitable for locale selection, eg. choice of language on mobile phone
We are currently planning for CLDR version 45 to be a closed release with no submission period. The focus will be on improving the Survey Tool used for data submission, making necessary infrastructure changes, and some high priority data quality fixes.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, September 14, 2023
Unicode CLDR v44 Alpha available for testing
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.
Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
- Sep 27 — Beta (data)
- Oct 04 — Beta2 (spec)
- Nov 01 — Release
- Formatting Person Names. Added further
enhancements (data and structure) for formatting people's names. For more
information on why this feature is being added and what it does, see
Background.
- Emoji 15.1 Support. Added short names,
keywords, and sort-order for the new Unicode 15.1 emoji.
- Unicode 15.1 additions. Made the regular
additions and changes for a new release of Unicode, including names for new
scripts, collation data for Han characters, etc.
- Digitally disadvantaged language coverage. Work began to improve DDL coverage,
with the following DDL locales now having higher coverage levels:
- Modern: Cherokee, Lower Sorbian, Upper Sorbian
- Moderate: Anii, Interlingua, Kurdish, Māori, Venetian
- Basic: Esperanto, Interlingue, Kangri, Kuvi, Kuvi (Devanagari), Kuvi (Odia), Kuvi (Telugu), Ligurian, Lombard, Low German, Luxembourgish, Makhuwa, Maltese, N’Ko, Occitan, Prussian, Silesian, Swampy Cree, Syriac, Toki Pona, Uyghur, Western Frisian, Yakut, Zhuang
In version 44, the following levels were reached:
Langs
Usage
95
Suitable for full UI internationalization
13
Suitable for “document content” internationalization, eg. in spreadsheet
50
Suitable for locale selection, eg. choice of language on mobile phone
We are currently planning for CLDR version 45 to be a closed release with no submission period. The focus will be on improving the Survey Tool used for data submission, making necessary infrastructure changes, and some high priority data quality fixes.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, June 15, 2023
ICU 73.2 & CLDR 43.1 released: GB18030 compliance updates & compatibility fixes
-
ICU is the
premier library for
software internationalization, used by a
wide array of companies and organizations to support the world's
languages, implementing both the latest version of the Unicode Standard and
of the Unicode locale data (CLDR).
-
CLDR provides key building blocks for
software to support the world's languages (dates, times, numbers,
sort-order, etc.). All major browsers and all modern mobile phones use CLDR
for language support. (See
Who uses CLDR?)
-
CLDR extends the support for “short”
Chinese sort orders to cover some additional, required characters for Level
2. This is carried over into ICU collation.
-
ICU has a modified character conversion
table, mapping some GB18030 characters to Unicode characters that were
encoded after GB18030-2005.
-
There are optional variants of time formats
with AM/PM (only for English) using ASCII spaces in CLDR that can also be
used in ICU via custom data generation. This is intended to help certain
implementers transition to the improved patterns, which have used a narrow
no-break space between the time and AM/PM since
CLDR 42.
- For how to generate ICU data with this option, look for alt="ascii" on tools/cldr/cldr-to-icu/README.md
-
The changes to the word segmentation
behavior of @ sign that were in CLDR 42 (ICU 72) have been reverted. These
caused problems for certain parsers that did not expect @ to join to
letters.
For details, please see:
-
ICU 73.2 Release Note:
ICU 73.2
maintenance release
-
CLDR 43.1 Release Note:
Version 43.1 Changes
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, June 1, 2023
Unlocking the Power of CLDR Person Name Formatting: A Solution for Formatting Names in a Globalized World
[image]
CLDR Person Names has moved from “tech preview” to “draft” status and is available for initial testing by implementors through ICU4J.
How a person’s name is displayed and used can convey respect, familiarity, or even be interpreted as rude if used improperly. That’s why it’s important to format names correctly, especially because naming practices vary across the globe. In many cultures, names can indicate gender, status, birthplace, nationality, ethnicity, religion, and more.
Until now, there have been no good standards for how to format people’s names in various contexts. A number of Unicode members wanted to address this problem and provide a mechanism that anyone could use to format people’s names in a wide variety of applications, such as contact lists, air travel, billing applications, CRMs, social media, and any other application that asks for user information and presents it back to the user or others.
The Unicode® Person Name Formats defines patterns used to take a person’s name and format it correctly in a given language or locale depending on a chosen context. With the Unicode Common Locale Data Repository (CLDR), locale codes and name sequences can be selected to create a specific pattern for formatting a person’s name — including preferences for formal, informal, or abbreviated versions. As a result, designers and developers can correctly display names according to the user’s native locale and culture, especially important when integrating names in different character scripts, such as Japanese, Chinese, or Russian.
The Unicode Consortium added Person Name formatting to CLDR in v42 and has been refined and enhanced for v43, which just released in April. In CLDR v43, with the help of linguists from around the world, we completed data for formatting people’s names for CLDR locales at modern coverage. Its formal name is "Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML); Part 8: Person Names". ICU has added the PersonNameFormatter class and is available in ICU 73.
To learn more, and get an idea of the implications for user experience and application design, see the following paper, which provides an illustration of the many contexts in which names can be formatted through CLDR Person Names.
LDML (UTS#35) Part 8: Person Names - a story teller’s case study
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, April 13, 2023
ICU 73 Released
ICU 73 improves Japanese and Korean short-text line breaking, reduces C++ memory use in date formatting, and promotes the Java person name formatter from tech preview to draft.
ICU 73 and CLDR 43 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15.1 which is planned for September.)
ICU 73 updates to the time zone data version 2023c (March 2023). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
For details, please see https://icu.unicode.org/download/73.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Wednesday, April 12, 2023
Unicode CLDR v43 released
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU).
CLDR 43 is a limited-submission release, focusing on just a few areas:
-
Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
-
Locales
- Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL.
- Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region.
-
Other data updates
- In English, Türkiye is now the primary country name for the country code TR, and Turkey is available as an alternate. Other locales have been reviewed to see whether similar changes would be appropriate.
- Name for the new timezone Ciudad Juárez.
-
Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
-
Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, March 30, 2023
The Unicode CLDR v43 Beta is now available for integration testing
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU), and the Specification changes, since those are new since the Alpha.
We appreciate feedback from both ICU and non-ICU consumers of CLDR data. (The Beta has already been integrated into the development version of ICU.) Feedback can be filed at CLDR Tickets. Any tickets should be filed as soon as possible, because the target release date is 2023 Apr 12, Wed.
CLDR 43 is a limited-submission release, focusing on just a few areas:
- Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
- Locales
- Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL
- Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region
- Other data updates
- Alternate names for Turkey / Türkiye
- Name for the new timezone Ciudad Juárez
- Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
- Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Thursday, February 23, 2023
The Unicode CLDR v43 Alpha is now available for integration testing
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
The Alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.
Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Data may change if release-blocking bugs are found. The planned schedule is:
- 2023 Mar 15, Wed — public Beta (data)
- 2023 Mar 29, Wed — public Beta2 (data & spec)
- 2023 Apr 12, Wed — Release
- Formatting Person Names
- Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
- Adding substantially to the LikelySubtags data
- This is used to find the likely writing
system and country for a given language, used in normalizing locale
identifiers and inheritance.
- The data has been contributed by SIL.
- This is used to find the likely writing
system and country for a given language, used in normalizing locale
identifiers and inheritance.
-
Other data updates
- Alternate names for Turkey / Türkiye
- Name for the new timezone Ciudad Juárez
- Structure
- Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
- Cleanup of the inheritance structure in CLDR
- Collation & Searching
- Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.
Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
[badge]
Wednesday, December 21, 2022
Unicode in 2022
Hello Everyone!
As we go into the New Year, the Unicode team thought we’d share some highlights from this past year. From source-code spoofing to preserving indigenous languages, the Unicode team has had another full year, including expanding the number of characters that appear on billions of devices around the world.
Nearly 150,000 characters!
On the character side, we reached a total of just shy of 150,000 characters (149,186 to be exact). Of the 4,489 characters added in the 15.0 release, the biggest set was 4,192 ideographs for use in Chinese, Japanese, and Korean. There are also two new scripts, Nag Mundari and Kawi. Nag Mundari is a script used to write the Mundari language of India, a language with 1.1 million speakers. Kawi is an important historic script of insular Southeast Asia, found in inscriptions and on artifacts in several languages dating from the 8th to the 16th centuries — and is undergoing a revival today amongst enthusiasts.And we can’t forget the 20 new emoji characters — we’re looking forward to seeing which are the most popular: shaking face? Goose? Maracas? Pink heart? If you’re involved in implementing emoji, you’ll also want to look at latest changes in UTS #51 Unicode Emoji.
See the Unicode15.0.0 page for more details. We’re also changing how we do releases — for more, see 2023 Release Planning.
The Launch of ICU4X
ICU is used in every major device and operating system; it’s how you see a date or number on your phone, for example. This new project, ICU4X, was created to solve the needs of clients who wish to provide client-side internationalization for their products in resource-constrained environments and across many programming languages. After 2½ years of work by Google, Mozilla, Amazon, and community partners, the Unicode Consortium has published ICU4X 1.0, its first stable release. Built from the ground up to be lightweight, portable, and secure, ICU4X learns from decades of experience to bring localized date formatting, number formatting, collation, text segmentation, and more to devices that, until now, did not have a suitable solution. For details, see Announcing ICU4X 1.0.When does i ≠ і?
Can you tell the difference between i and і? Yeah, most people can’t. The first set of changes to help counter source-code spoofing were included in the 15.0 versions of the UAX #9 Unicode Bidirectional Algorithm, UAX #31 Unicode Identifier and Pattern Syntax, and UTS #39 Unicode Security Mechanisms.For 2023, there is a new draft UTS #55 Unicode Source Code Handling, providing guidance for programming language designers and tooling developers, and specifying mechanisms to avoid usability and security issues arising from improper handling of Unicode. More changes are on their way for UAX #9, UAX #31, and UTS #39 as well.
Åge Møller, Πέτρος Νικόλαος Καρατζής, ராஜேந்திர சோழன்
We’re making great progress on internationalized formatting of people’s names. What does that mean? Software needs to be able to format people's names, such as John Smith or 宮崎駿. The formatting can be surprisingly complicated: for example, people may have a different number of names, depending on their culture — they might have only one name (“Zendaya”), only two (“Albert Einstein”), or three or more. So the software needs to handle missing or extra name fields gracefully.There are many more complexities — for more details, see Formatting people’s names.
You have 2 unread messages.
Or, you have 3 items in your cart. Whenever a computer needs to construct a sentence using “placeholders” such as 3, it is formatting a message. The current industry standard is ICU’s message formatting; a project started about 3 years ago, with the goal of improving on that to build a more robust and extensible mechanism. There is now a Tech Preview in ICU — we’d urge developers to try it out!See message-format-wg for details on the syntax and message2/package-summary.html for the API (note that the ICU’s convention for tech previews is to mark as Deprecated), and the test code in MessageFormat2Test.java for examples of usage.
(There are of course other fixes, upgrades and new features in ICU: see ICU 72 and ICU 71 for more details.)
Māori, Wolof, тоҷикӣ, کٲشُر, ትግርኛ, कॉशुर, মৈতৈলোন্, ᱥᱟᱱᱛᱟᱲᱤ
In CLDR, we now have 95 languages at the Modern level (suitable for full UI internationalization), 6 at the Moderate level (suitable for “document content” internationalization), and 29 at the Basic level (suitable for locale selection). We added a tech preview of formatting for person names, plus additions for Unicode 15.0 (emoji names and search keywords), names for new scripts, new CJK collation, and so on. For more information, see CLDR v42.Revitalization and Preservation of Indigenous Languages
The Nattilik language community was unable to use their language reliably for even simple, everyday digital text exchanges such as email or text messaging. The Typotheque Syllabics Project, an initiative based out of Toronto and The Hague, Netherlands, undertook research with language keepers across various Syllabics-using Indigenous communities in Canada. By collaborating with Nattilik language keepers and elders in the community, key issues the Nattilik community of Western Nunavut faced were identified, and it was discovered that there were 12 missing syllabic characters from the Unicode Standard. The Consortium worked with the Typotheque Syllabics Project to add 16 characters to the script to support Nattilik and other languages in Unicode version 14.0, and improved the glyphs in Unicode version 15.0. See this blog post from June.The Past and Future of Flag Emoji
Despite being the largest emoji category with a strong association tied to identity, flags are by far the least used. Flag emoji have always been subject to special criteria due to their open-ended nature, infrequent use, and burden on implementations. The addition of other flags and thousands of valid sequences into the Unicode Standard has not resulted in wider adoption. They don’t stand still, are constantly evolving, and due to the open-ended nature of flags, the addition of one creates exclusivity at the expense of others. Curious to learn more? Read more about the Past and Future of Flag Emoji.Available Now! New YouTube Playlist and Technical Quick Start Guide
On September 28th, Unicode held a webinar on the “Overview of Internationalization and Unicode Projects” for Unicode enthusiasts. Unicode technical leadership and other experts shared background on our core projects with participants from more than 30 countries. If you missed the webinar, no worries! The recorded sessions are available on this YouTube playlist. And if you are new to Unicode and internationalization or simply want a refresh, you can also check out our Technical Quick Start Guide. This handy guide explains what Unicode is, including answering the question, “What is Internationalization and Why it Matters.” There are also useful links to more detailed information and how you can get involved. Read more here.Support Unicode 💞💕💌💯✨🌟🤠🛟🎁
Finally, if you are already a contributor to — or member of Unicode (or your company or organization is!), thank you, Danke, Děkuju, धन्यवाद, merci, 谢谢你, grazie, நன்றி, and gracias! What we have accomplished is only possible because of supporters like you.And if you want to support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode is a US-based non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.
Tuesday, November 8, 2022
Available Now! New YouTube Playlist and Technical Quick Start Guide
Youtube Image
By Elango Cheran
On September 28th, Unicode held a webinar on the “Overview of Internationalization and Unicode Projects” for Unicode enthusiasts. More than 180 people across 30 countries joined us for this online event.
The Consortium is pleased to now make available the videos from this event. If you are new to Unicode and internationalization or want an overview of the most recent projects, check out our new YouTube playlist and Technical Quick Start Guide.
Our Technical Leadership and other experts provide a handy overview on such topics as:
-
Introduction to Internationalization - Addison Phillips,
Internationalization Engineer
- Unicode Consortium: Past, Present, and Future - Mark Davis, Cofounder and President
- Scripts and Character Encoding - Deborah Anderson, Chair of the Script Ad Hoc Committee
- Unicode CLDR (Common Locale Data Repository) - Mark Davis and Annemarie Apple, Chair and Vice Chair of the CLDR Committee
- Unicode ICU (International Components for Unicode) - Markus Scherer, Chair of ICU Committee
- Unicode ICU4X 2022 - Shane Carr, Chair of ICU4X Subcommittee
The Unicode Technical Quick Start Guide is also now available. The guide explains what Unicode is, including answering the question, “What is Internationalization and Why it Matters.” There is also an overview of the technical committees and useful links to more detailed information and how you can get involved.
Friday, October 21, 2022
ICU 72 Released
ICU 72 and CLDR 42 are major releases, including a new version of Unicode and major locale data improvements.
ICU 72 adds two technology preview implementations based on draft Unicode specifications:
- Formatting of people’s names in multiple languages (CLDR background on why this feature is being added and what it does)
- An enhanced version of message formatting
For details, please see https://icu.unicode.org/download/72.