Tuesday, January 4, 2022
Unicode 14.0 Paperback Available
U14 paperback vol 1 image The Unicode 14.0 core specification is now available in paperback book form with an original cover design by Sophia Tai. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 14.0 of the Unicode Standard.
Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US 36ドル.72, plus shipping and any applicable taxes.
These volumes do not include the Version 14.0 code charts, nor do they include the Version 14.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.
Purchase The Unicode Standard, Version 14.0 - Core Specification Volume 1 and Volume 2.
Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US 36ドル.72, plus shipping and any applicable taxes.
These volumes do not include the Version 14.0 code charts, nor do they include the Version 14.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.
Purchase The Unicode Standard, Version 14.0 - Core Specification Volume 1 and Volume 2.
Thursday, December 2, 2021
The Most Frequently Used Emoji of 2021
The Unicode Emoji Mirror Project
Emoji 15 image92% of the world’s online population use emoji — but which emoji are we using? The Unicode Consortium, the not-for-profit organization responsible for digitizing the world’s languages, gathers information about how frequently emoji are used. Looking at patterns of usage helps to determine what new emoji should be added to the Unicode Standard. As part of this effort, we are making that data available to the public.
The new Unicode Emoji Frequency page lists the Unicode v12.0 emoji ranked in order of how frequently they were used in 2021 and what has changed since 2019. Check it out for more analysis, insights and patterns that illustrate our collective experience during a global pandemic.
#UnicodeEmojiMirror
Wednesday, November 17, 2021
Unicode Emoji 15.0 Provisional Candidates
Emoji 15 image
The Unicode Technical Committee has approved the list of provisional candidates for Emoji 15.0. They are slated for release in September 2022 together with Unicode 15.0. These candidates were identified by the Unicode Emoji Subcommittee after reviewing proposals ranked according to previously-determined selection factors.
The list of provisional emoji candidates can be found here. Note that they have not yet been assigned code points or properties. For comments on these candidates, please reference PRI #435 in your feedback.
How to Provide Feedback: For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.
Feedback is reviewed by the relevant committee according to their meeting schedule.
The Unicode Technical Committee has approved the list of provisional candidates for Emoji 15.0. They are slated for release in September 2022 together with Unicode 15.0. These candidates were identified by the Unicode Emoji Subcommittee after reviewing proposals ranked according to previously-determined selection factors.
The list of provisional emoji candidates can be found here. Note that they have not yet been assigned code points or properties. For comments on these candidates, please reference PRI #435 in your feedback.
How to Provide Feedback: For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.
Feedback is reviewed by the relevant committee according to their meeting schedule.
Wednesday, November 10, 2021
ICU4X 0.4 Released
ICU LogoUnicode® ICU4X 0.4 has just been released. This revision brings an
implementation of
Unicode Properties, major
performance and
memory improvements for DateTimeFormat, and extends the data provider data
loading models with
BlobDataProvider.
ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.
The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.
ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.
Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.
For details, please see the changelog.
ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.
The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.
ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.
Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.
For details, please see the changelog.
Thursday, October 28, 2021
Unicode CLDR v40 now available!
[nest image] Unicode CLDR version 40 is now available, with approximately
140,000 new or modified data fields.
In this release, the focus is on:
Please see the CLDR v40 Release Note for details, including:
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
In this release, the focus is on:
Grammatical features (gender and case)
In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case.- Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv) for all units of measurement.
- Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
- Phase 3 (v41) will further expand the units.
Emoji v14 names and search keywords
CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.Modernized Survey Tool front end
The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.Specification Improvements
The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.Please see the CLDR v40 Release Note for details, including:
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
ICU 70 Released
ICU LogoUnicode® ICU 70 has just been released. ICU 70 incorporates updates to
Unicode 14, including new characters, scripts, emoji, and corresponding API
constants. ICU 70 adds support for emoji properties of strings. It also updates
to CLDR 40 locale
data with many additions and corrections. ICU 70 also includes many other bug
fixes and enhancements, especially for measurement unit formatting, and it can
now be built and used with C++20 compilers.
ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).
For details, please see https://icu.unicode.org/download/70.
Note: Our website has moved. Please adjust your bookmarks.
ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).
For details, please see https://icu.unicode.org/download/70.
Note: Our website has moved. Please adjust your bookmarks.
Wednesday, October 6, 2021
Unicode CLDR v40 Beta available for testing
[beta image] The Unicode CLDR v40 Beta is now available for testing. The beta has already been
integrated into the development version of ICU. We would especially appreciate
feedback from non-ICU consumers of CLDR data. Feedback can be filed at
CLDR Tickets.
Beta means that the main data, charts, and specification are available for review, but the JSON data is not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
Grammatical features (gender and case) for units of measurement in additional locales
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
Beta means that the main data, charts, and specification are available for review, but the JSON data is not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
- Oct 27 — Release
Grammatical features (gender and case) for units of measurement in additional locales
- In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours"
- Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv).
- Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
- These supply short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards
- The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure (very difficult to enhance or even fix bugs) was modernized.
- Notably in the areas of Locale Identifiers, Dates, and Units of Measurement
Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
Tuesday, September 14, 2021
Announcing The Unicode® Standard, Version 14.0
Vithkuqi Sample Version 14.0 of the Unicode Standard is now available, including the core specification,
annexes, and data files. This version adds 838 characters, for a total of 144,697
characters. These additions include five new scripts, for a total of 159
scripts, as well as 37 new emoji characters.
The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:
Five important Unicode annexes updated for Version 14.0:
The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:
- Arabic script additions that include honorifics and additions for Quranic use, and characters used to write languages across Africa, the Balkans, and South and Southeast Asia
- The Vithkuqi script historically used to write Albanian and currently undergoing a modern revival
- The Tangsa script used to write the Tangsa language, spoken in India and Myanmar
- The Toto script used to write the Toto language in northeast India
- Many Latin script additions for extended IPA
- 37 emoji characters, including several new emoji for emotion and hand gestures (smileys, hands, animals and nature, food and drink, transport, and activities). For the full list of new emoji characters, see emoji additions for Unicode 14.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.
- The som currency sign used in the Kyrgyz Republic
- Znamenny musical notation developed in Russia
- Cypro-Minoan, historically used primarily on the island of Cyprus
- Old Uyghur, historically used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
- Ahom, Balinese, Brahmi, Canadian aboriginal languages, Glagolitic, Kaithi, Kannada, Mongolian, Tagalog, Takri, and Telugu
- Arabic support for Hausa, Wolof, Hindko, and Punjabi, and Ethiopic support for Gurage
- Significant updates to the CJK auxiliary blocks and enclosed alphanumerics
Five important Unicode annexes updated for Version 14.0:
- UAX #14, Unicode Linebreaking Algorithm
- UAX #29, Unicode Text Segmentation
- UAX #31, Unicode Identifier and Pattern Syntax
- UAX #38, Unicode Han Database (Unihan)
- UAX #45, U-Source Ideographs
- UTS #10, Unicode Collation Algorithm — sorting Unicode text
- UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
- UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs
Labels:
Arabic,
Emoji 14.0,
Tangsa,
Toto,
Unicode 14,
Vithkuqi,
Znamenny
Subscribe to:
Comments (Atom)