The Unicode Blog

Tuesday, January 4, 2022

Unicode 14.0 Paperback Available

U14 paperback vol 1 image The Unicode 14.0 core specification is now available in paperback book form with an original cover design by Sophia Tai. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 14.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US 36ドル.72, plus shipping and any applicable taxes.

These volumes do not include the Version 14.0 code charts, nor do they include the Version 14.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 14.0 - Core Specification Volume 1 and Volume 2.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 1:05 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: paperback, Unicode 14

Thursday, December 2, 2021

The Most Frequently Used Emoji of 2021

The Unicode Emoji Mirror Project

Emoji 15 image
92% of the world’s online population use emoji — but which emoji are we using? The Unicode Consortium, the not-for-profit organization responsible for digitizing the world’s languages, gathers information about how frequently emoji are used. Looking at patterns of usage helps to determine what new emoji should be added to the Unicode Standard. As part of this effort, we are making that data available to the public.

The new Unicode Emoji Frequency page lists the Unicode v12.0 emoji ranked in order of how frequently they were used in 2021 and what has changed since 2019. Check it out for more analysis, insights and patterns that illustrate our collective experience during a global pandemic.

#UnicodeEmojiMirror

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 6:00 AM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: 2021, emoji, frequency, UnicodeEmojiMirror

Wednesday, November 17, 2021

Unicode Emoji 15.0 Provisional Candidates

Emoji 15 image
The Unicode Technical Committee has approved the list of provisional candidates for Emoji 15.0. They are slated for release in September 2022 together with Unicode 15.0. These candidates were identified by the Unicode Emoji Subcommittee after reviewing proposals ranked according to previously-determined selection factors.

The list of provisional emoji candidates can be found here. Note that they have not yet been assigned code points or properties. For comments on these candidates, please reference PRI #435 in your feedback.

How to Provide Feedback: For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Feedback is reviewed by the relevant committee according to their meeting schedule.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 9:13 AM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: emoji, emoji 15.0, PRI #435

Wednesday, November 10, 2021

ICU4X 0.4 Released

ICU LogoUnicode® ICU4X 0.4 has just been released. This revision brings an implementation of Unicode Properties, major performance and memory improvements for DateTimeFormat, and extends the data provider data loading models with BlobDataProvider.

ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.

The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.

Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.

For details, please see the changelog.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 10:54 AM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: FFI, ICU, ICU4X, Rust, Unicode

Thursday, October 28, 2021

Unicode CLDR v40 now available!

[nest image] Unicode CLDR version 40 is now available, with approximately 140,000 new or modified data fields.

In this release, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case.

Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv) for all units of measurement.
Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
Phase 3 (v41) will further expand the units.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Please see the CLDR v40 Release Note for details, including:

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 1:12 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: CLDR, cldr 40, LDML, Unicode 14

ICU 70 Released

ICU LogoUnicode® ICU 70 has just been released. ICU 70 incorporates updates to Unicode 14, including new characters, scripts, emoji, and corresponding API constants. ICU 70 adds support for emoji properties of strings. It also updates to CLDR 40 locale data with many additions and corrections. ICU 70 also includes many other bug fixes and enhancements, especially for measurement unit formatting, and it can now be built and used with C++20 compilers.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see https://icu.unicode.org/download/70.

Note: Our website has moved. Please adjust your bookmarks.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 12:31 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: CLDR, cldr 40, ICU, ICU 70, Unicode 14

Wednesday, October 6, 2021

Unicode CLDR v40 Beta available for testing

[beta image] The Unicode CLDR v40 Beta is now available for testing. The beta has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Beta means that the main data, charts, and specification are available for review, but the JSON data is not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:

Oct 27 — Release

In CLDR v40, the focus is on:

Grammatical features (gender and case) for units of measurement in additional locales

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours"
Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv).
Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.

Emoji v14 names and search keywords

These supply short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards

Modernized Survey Tool front end.

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure (very difficult to enhance or even fix bugs) was modernized.

Specification Improvements

Notably in the areas of Locale Identifiers, Dates, and Units of Measurement

There are many other changes: to find out more, see the draft CLDR v40 release page, which has information on accessing the date, reviewing charts of the changes, and necessary migration changes.

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 3:00 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: beta, CLDR, cldr 40

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Vithkuqi Sample Version 14.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 838 characters, for a total of 144,697 characters. These additions include five new scripts, for a total of 159 scripts, as well as 37 new emoji characters.

The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:

Arabic script additions that include honorifics and additions for Quranic use, and characters used to write languages across Africa, the Balkans, and South and Southeast Asia
The Vithkuqi script historically used to write Albanian and currently undergoing a modern revival
The Tangsa script used to write the Tangsa language, spoken in India and Myanmar
The Toto script used to write the Toto language in northeast India
Many Latin script additions for extended IPA

Popular symbol additions include:

37 emoji characters, including several new emoji for emotion and hand gestures (smileys, hands, animals and nature, food and drink, transport, and activities). For the full list of new emoji characters, see emoji additions for Unicode 14.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

Other symbol and notational additions include:

The som currency sign used in the Kyrgyz Republic
Znamenny musical notation developed in Russia

Support for other modern languages and scholarly work extends worldwide, including:

Cypro-Minoan, historically used primarily on the island of Cyprus
Old Uyghur, historically used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
Ahom, Balinese, Brahmi, Canadian aboriginal languages, Glagolitic, Kaithi, Kannada, Mongolian, Tagalog, Takri, and Telugu
Arabic support for Hausa, Wolof, Hindko, and Punjabi, and Ethiopic support for Gurage

Important chart font updates, including:

Significant updates to the CJK auxiliary blocks and enclosed alphanumerics

Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 14.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 14.0:

Three important Unicode specifications updated for Version 14.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Posted by Unicode, Inc. at 1:20 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: Arabic, Emoji 14.0, Tangsa, Toto, Unicode 14, Vithkuqi, Znamenny

Tuesday, January 4, 2022

Unicode 14.0 Paperback Available

Thursday, December 2, 2021

The Most Frequently Used Emoji of 2021

The Unicode Emoji Mirror Project

Wednesday, November 17, 2021

Unicode Emoji 15.0 Provisional Candidates

Wednesday, November 10, 2021

ICU4X 0.4 Released

Thursday, October 28, 2021

Unicode CLDR v40 now available!

Grammatical features (gender and case)

Emoji v14 names and search keywords

Modernized Survey Tool front end

Specification Improvements

ICU 70 Released

Wednesday, October 6, 2021

Unicode CLDR v40 Beta available for testing

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog