Tuesday, December 2, 2025
UTC #185 Highlights
Unicode Technical Committee meeting #185 was held October 27 – 29 in Cupertino, CA, hosted by Apple. Here are some highlights.
Starting the Unicode 18.0 cycle
As we've been following an annual September release cycle for the Unicode Standard, the Q4 UTC meeting is the first meeting during a new cycle. While some decisions targeting the release might have been taken at a previous meeting, this is the first meeting in which the next release has particular focus. One of the decisions taken is to plan out the key milestones and dates for the next new cycle. Here's a summary of the timeline for Unicode 18.0:
November 2025: UTC #185 approved new character repertoire
January 2026: UTC #186 will finalize content for the alpha release
February – March: alpha release open for public review
April: UTC #187 will review alpha feedback and finalize content for the beta release
May – June: beta release open for public review
July: UTC #188 will finalize 18.0 content
September: Unicode 18.0 release
Unicode 18.0 character and emoji repertoire
During a release cycle, the primary focus for the alpha review is on the new character repertoire. The repertoire for the alpha review can be updated at the January UTC meeting; but we like to have that planned repertoire largely determined by the Q4 meeting so that working groups can focus early on preparing content that will be needed for the alpha.
UTC #184 had approved around 60 characters for publication in Unicode 18.0. (Some of those had been planned for Unicode 17.0 but, for various reasons, needed to be postponed.) These included the UAE Dirham sign, and the first tranche of a large set of symbols from the writings of Gottfried Leibniz for which proposals are in development. At UTC #185, nearly 13,000 additional characters were approved for encoding in Unicode 18.0.
The approved additions include encoding of Small Seal script ("Seal"), a repertoire of 11,328 ideographic characters. Seal is distinct from modern Han ideographs (aka, "CJK"), but is an important precursor of CJK resulting from the first efforts to standardize writing across Chinese-speaking regions during China's Qin Dynasty. As such, Seal has important cultural significance in China and for Chinese speakers throughout the world.
Other additions included 1,276 characters allocated in three new blocks: Archaic Cuneiform Numerals — 311 Cuneiform characters from the fourth millenium BCE; and Jurchen and Jurchen Radicals — 965 ideographic characters that were used for writing the Jurchen language in the12th – 13th century CE.
In addition, 321 other characters were approved as additions to a number of existing blocks. This includes many characters for Arabic and Latin scripts, many characters used in phonetic transcription, a number of symbols used in music notation, and a second set of the Leibniz symbols.
Finally, the new characters approved for Unicode 18.0 includes nine new emoji characters. Note that many emoji are represented as character sequences, so mentioning the new emoji characters doesn't provide a complete picture. Look for more information about Unicode 18.0 emoji in the coming months.
CJK & Unihan
UTC works on CJK character encoding in collaboration with IRG (Ideographic Research Group), a working group under ISO/IEC JTC 1/SC 2. There are over 100,000 CJK ideographs now encoded in Unicode, and with such a large repertoire of characters there are refinements to the already-encoded characters that continue to be made. At UTC #185, recommendations arising from a recent IRG meeting were reviewed, and a number of changes were approved for Unicode 18.0. Some of these are technical details that are not so visible, such as corrections to source references for certain characters (the references cited when the characters were encoded providing evidence of their usage and identity as distinct characters). Among the significant and visible changes approved by UTC are over 700 horizontal extensions , which will be reflected in the Unicode 18.0 code charts with additional glyphs for already-encoded characters.
For complete details on outcomes from UTC #185, see the draft minutes .
About the Unicode Standard
The world relies on digital communications. The Unicode Standard is a vital building block for global digital communications, providing the encoding for more than 155,000 characters used by thousands of languages and scripts throughout the world.
Each character—letter, diacritic, symbol, emoji, etc.—is represented by a unique numeric code, and has defined properties data that define how characters behave in several text processing algorithms.
With this combination, The Unicode Standard provides the foundation for implementations to support the world's writing systems, enabling billions of people across the globe to seamlessly communicate with one another across platforms and devices. The Standard is also the foundation for the suite of code, libraries, data, and products that the Unicode Consortium delivers for robust language support.
----------------------------------------------
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
π️ππ️π¨π₯πη±₿♜π
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Wednesday, November 5, 2025
Introducing the Unicode Inflection Library Technical Preview Release
The problem of linguistic inflection has long been a barrier to effective software internationalization. The problem is even more visible today with multimodal UIs. In many languages, word forms change (inflect) based on grammatical context, creating a significant challenge for developers aiming to build truly global applications. Getting the wrong word inflection can be as bad as using the wrong preposition in English.
Today, the Unicode Consortium is announcing a major step forward with the Technical Preview Release of the Unicode Inflection Library . It provides direct access through C and C++ APIs , or can be used in conjunction with Message Format 2.0 functionality .
This library is designed to solve a problem that is particularly acute in languages with a large number of inflectional forms, such as the Slavic, Germanic, Romance, Semitic, Indic and agglutinative families of languages.
The issue extends beyond common words like adjectives, nouns, and verbs. In many of these languages, proper nouns—including geo-location names, brands, and people’s names—can also inflect. This complexity affects a large number of users and has been largely unaddressed by the industry, which has typically opted for narrow, language-specific solutions. Even languages like French require handling inflection for gender and number, demonstrating the problem is not limited to a few specific language families.
The Unicode Inflection Library provides a robust and standardized approach to this challenge. It leverages extensive data sets to handle complex grammatical transformations, enabling more accurate text generation, search functionality, and natural language processing. A key resource for this project is the availability of comprehensive lexicons from the Wikidata project , which provide the foundational data necessary for these operations.
Get Started and Participate
This is a community effort. We invite developers and linguists to explore the library's capabilities and contribute to its development. A detailed tutorial is available to help you get started:
Tutorial: https://github.com/unicode-org/inflection/wiki/Tutorial
Release: https://github.com/unicode-org/inflection/releases/tag/Inflection-0.1
Your feedback and contributions are critical for refining the library's rules, expanding language coverage, and ensuring its performance. By participating, you will help build a foundational tool that will make the digital world more accessible and linguistically accurate for hundreds of millions of users.
----------------------------------------------
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
π️ππ️π¨π₯πη±₿♜π
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
Thursday, October 30, 2025
Unicode CLDR 48 available
Some of the most significant changes in this release are the following (for more detail, see the CLDR 48 release note page ):
Updated for Unicode 17, including new names and search terms for new emoji, new sort order, and Han→Latin romanization additions for many characters.
Updated to the latest external standards and data sources, such as the language subtag registry, UN M49 macro regions, ISO 4217 currencies, etc.
Many additions to language data including:
Likely Subtags, for deriving the likely script and region from the language (used in many processes)
New formatting options:
Rational number formats added, allowing for formats like “5½” in tech preview
For timezones, usesMetazone adds two new attributes stdOffset and dstOffset so that implementations can use either “main” or “rearguard” TZDB data
Combination formats added for relative dates + times, such as “tomorrow at 12:30”
Additional units added for scientific contexts (coulombs, farads, teslas, etc.) and for English systems (fortnights, imperial pints, etc.)
Many corrections and updates for Metazone data and calendars eras (including removal of eras and fixes to start dates)
This is the first release where the new CLDR Organization process is in place for DDL languages. As a result, several locales were able to reach higher levels (see below).
See the CLDR 48 release note page for information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues .
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and modern mobile phones use CLDR for language support. (See Who uses CLDR? )
Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
Locale Coverage Levels
Changes in coverage
±
New Level
Locales
π
Modern
Akan, Bashkir, Chuvash, Kazakh (Arabic), Romansh, Shan, Quechua
π
Moderate
Anii, Esperanto
π
Basic
Buriat, Piedmontese, Sicilian, Tuvinian
π
Basic*
Baluchi (Latin), Kurdish
----------------------------------------------
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
π️ππ️π¨π₯πη±₿♜π
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
ICU4X 2.1 released!
The ICU4X Technical Committee is happy to announce ICU4X 2.1, an update to our modular, portable, and secure i18n library.
ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, Dart, TypeScript — with other languages in the timeline. Mozilla Firefox, Google Chrome, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.
Important changes since ICU4X 2.0 include:
Latest i18n data: This release includes an update to CLDR 48 .
Calendar improvements: ICU4X is now being used to implement Temporal in V8 and SpiderMonkey via temporal_rs . icu_calendar has received many fixes and improvements in service of that, including new experimental arithmetic APIs.
Normalizer optimizations: icu_normalizer has received a lot of optimization work, with some more to come. Optimizations made to shared data structures will benefit other components as well.
Collation sort keys: It is now possible to use icu_collator to extract the sort key of a given string to amortize the cost of collation operations.
When updating ICU4X crates to 2.1, you may experience issues due to incompatibilities between older crates and newer crates around the alloc feature. In that case, please run cargo update for any crates that show up in the errors.
See the full changelog for more information
Check out our quickstart tutorial , interactive demo , or C++ , TypeScript , and (experimental) Dart documentation.
As before, the Rust crate is available at crates.io , with documentation at docs.rs .
Please post any questions via GitHub Discussions .
----------------------------------------------
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
π️ππ️π¨π₯πη±₿♜π
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock