Thursday, February 27, 2025
Unicode CLDR 47 Beta available for specification review: MessageFormat now Stable!
The Unicode CLDR 47 Beta is now available for specification review and integration testing. The release is planned for April 17th, but any feedback on the specification needs to be submitted well in advance of that date. The changes in the specification are available at Draft LDML Modifications .
The biggest change is that MessageFormat has advanced from Final Candidate to Stable. This means that the stability guarantees are in place and implementations can finalize their APIs. There are many planned changes for CLDR 48 — see the Migration section for a list of upcoming changes that will affect implementations.
The beta has already been integrated into the development version of ICU. We would especially appreciate feedback from ICU users and non-ICU consumers of CLDR data, and on Migration issues. Feedback can be filed at CLDR Tickets .
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR? )
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems. CLDR 47 did not have a Survey Tool submission phase, and instead focused on tooling and a few functional areas.
MessageFormat 2.0 Stable
Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. MessageFormat 2.0 enables developers and translators to create natural-sounding user interfaces that can appear in any language and support the needs of various cultures.
The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats) and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides the means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.
Tech Preview implementations are available in C++, Java, and JavaScript:
ICU4J, Java:com.ibm.icu.message2 , part of ICU 76, is a tech preview implementation of the MessageFormat 2.0, together with a formatting API. See the ICU User Guide for examples and a quickstart guide, and Trying MF 2.0 Final Candidate to try a “Hello World”.
ICU4C, C++:icu::message2::MessageFormatter , part of ICU 76, is a tech preview implementation of MessageFormat 2.0, together with a formatting API. See the ICU User Guide for examples and a quickstart guide, and Trying MF 2.0 Final Candidate to try a “Hello World”.
Javascript:messageformat 4.0 provides a formatter and conversion tools for the MessageFormat 2 syntax, together with a polyfill of the runtime API proposed for ECMA-402.
(Because of the timing, these implement a slightly earlier version of the spec, but can be used for initial evaluation, testing, and experimentation.)
See also:
UTW 2024 {�} MessageFormat v2 (October 2024)
Message Format Virtual Open House (February 2024)
Tooling changes
Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in CLDR 47 provide faster turn-around for linguists and higher data quality. They are targeted at the CLDR 48 submission period, starting in April 2025.
For more information
See the draft CLDR 47 release page , which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.
Friday, February 7, 2025
Unicode CLDR 47 Alpha Now Available for Testing
The Unicode CLDR 47 Alpha is now available for integration testing.
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR ?)
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues . Feedback can be filed at CLDR Tickets .
CLDR 47 focused on MessageFormat 2.0 and tooling for an expansion of DDL support. It was a closed cycle: locale data changes were limited to bug fixes and the addition of new locales, mostly regional variants.
RBNF improvements and Transforms
CLDR added Gujarati RBNF support, which provides number spell out functionality, and made improvements to many other languages .
Transforms were also improved in both CLDR 46.1 and 47 releases which included:
Adding a Hant-Latn transliterator
Aliasing Hans-Latn to Hani-Latn
Improvements to several other transliterators
More regional variants
Over the past few years there have been an increasing number of requests for locales to be added to languages, such as English, when they are commonly used in a region as a lingua franca.
CLDR has been adding additional child locales to support these requests and has begun restructuring inheritance to allow for better maintenance of shared regional data, such as currency symbols and metazone names.
46.1 Improvements
CLDR 46.1 was a special interim release of CLDR that focused on MessageFormat 2.0. It included a few additional changes:
More explicit well-formedness and validity constraints for unit of measurement identifiers
Addition of derived emoji annotations that were missing: emoji with skin tones facing right
Fixes to make the ja, ko, yue, zh datetimeSkeletons useful for generating the standard patterns
Improved date/time test data
For more information, see 46.1 Changes
Tooling changes
Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in CLDR 47 provide faster turn-around for linguists, and higher data quality. They are targeted at the CLDR 48 submission period, starting in April, 2025.
For more information
See the draft CLDR v47 Release Note , which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.
Tuesday, February 4, 2025
Unicode Welcomes New Board Chair and Members
Mark Davis Steps Down as Board Chair
Mark Davis, co-founder of the Unicode Consortium and a pivotal figure in global digital communication, has transitioned from Chair of the Board of Directors to a continuing role on the Board. He will also remain Chair of the CLDR Technical Committee and Chief Technology Officer (CTO) for the Consortium. This transition reflects Davis’s commitment to ensuring a smooth leadership transition and his continuing dedication to the Consortium’s mission of enabling everyone around the globe and across all technology platforms to seamlessly communicate and collaborate in their own languages.
During Davis’s tenure as Board Chair, the Unicode Consortium solidified its position as a crucial global organization. Under his leadership, the Consortium standardized character encoding for modern and historical scripts and symbols, including the now ubiquitous emoji. He also founded two vital projects that have become core pillars of the Consortium: CLDR (Common Locale Data Repository), providing structured data for internationalization, and ICU (International Components for Unicode), delivering production-ready code libraries. These projects, along with the Unicode encoding, are fundamental to virtually all modern phones and operating systems, enabling billions of people worldwide to communicate in their native languages.
“One of the most satisfying accomplishments in a career is to find successors who take on challenging positions – and achieve even greater impact,” said Mark Davis. “Markus Scherer has done that for ICU; Jennifer Daniel for the Emoji working group (aka ESC), and Toral Cowieson as CEO of the Consortium. Having worked with Cathy during pivotal years in the development of the Unicode encoding, I'm confident that her talents and skills will make her an exceptional Chair of the Board.”
Cathy Wissink Elected as Board Chair
We are pleased to announce that the Unicode Board of Directors voted unanimously to appoint Cathy Wissink as the new chair. Wissink is a 30-year veteran of the technology industry, focused primarily at the intersection of international markets and innovation. The bulk of her career was spent at Microsoft, in diverse roles ranging from engineering to government affairs to product certification. She’s no stranger to the Unicode Consortium, having worked on Unicode and internationalization implementation from the earliest versions of 32-bit Windows through Windows 7. Wissink also led Microsoft’s participation in the Unicode Technical Committee from 2000-2005 and served as UTC vice-chair and INCITS/L2 chair from 2002-2005.
"I am grateful for the trust that Mark and the board have placed in me as the incoming chair of the Board of Directors for the Unicode Consortium," said Wissink. "Unicode's products and standards are essential to global digital communications, and as innovation progresses and languages evolve, there is still significant work to enable all languages in digital spaces. I look forward to collaborating with Mark Davis and Toral Cowieson, as well as the broader community of technologists, linguists, and specialists to advance Unicode's mission."
Welcome to new Board Members, John Tinsley and Manuela Giese
John Tinsley is the VP of AI Solutions at Unicode member company Translated. He’s a computer scientist with more than 15 years of experience in the localization industry. Prior to Translated, he founded Iconic Translation Machines, an award-winning language technology software business that pioneered the commercial deployment of Neural Machine Translation technology. John led the business for almost a decade before selling it to RWS in 2020 in one of the largest technology deals in the language industry.
He holds a PhD in Computer Science and a degree in Applied Computational Linguistics and is a regular public speaker on topics related to language, translation, and business.Manuela Giese is a Principal Group Manager at Microsoft. She has spent the last 25 years working on various aspects of localization across content types and languages including complex scripts; she still has fond memories of managing complex script languages through localization deliveries in the earlier days of Unicode support.
In recent years, she has been more focused on business horizontals supporting localization models and their challenges. She is passionate about language and culture and how both intersect with equality and gender. She has spent significant time in Europe, South America, and the US and currently resides on the ancestral homeland of the Nooksack, Lummi, and other Coast Salish peoples.
Unicode 17.0 Alpha Review Opens for Feedback
For the alpha review, preliminary data files are also available, with data covering existing and new character repertoire. In addition, a draft for the core specification is available, with new block descriptions for some of the newly added blocks and scripts.
The primary focus for the alpha review should be on the new character repertoire. This early review is provided so that reviewers may consider the repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2025). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.