Friday, April 8, 2022
ICU 71 Released
ICU LogoUnicode® ICU 71 has just been released. ICU is the
premier library for software
internationalization, used by a
wide array of companies and organizations to support the world's languages,
implementing both the latest version of the Unicode Standard and of the Unicode
locale data (CLDR). ICU 71 updates to
CLDR 41 locale data with various additions and corrections.
ICU 71 adds phrase-based line breaking for Japanese. Existing line breaking methods follow standards and conventions for body text but do not work well for short Japanese text, such as in titles and headings. This new feature is optimized for these use cases.
ICU 71 adds support for Hindi written in Latin letters (hi_Latn). The CLDR data for this increasingly popular locale has been significantly revised and expanded. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”.
ICU 71 and CLDR 41 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15 which is planned for September.) We are also working to re-establish continuous performance testing for ICU, and on development towards future versions.
ICU 71 updates to the time zone data version 2022a. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
For details, please see https://icu.unicode.org/download/71.
ICU 71 adds phrase-based line breaking for Japanese. Existing line breaking methods follow standards and conventions for body text but do not work well for short Japanese text, such as in titles and headings. This new feature is optimized for these use cases.
ICU 71 adds support for Hindi written in Latin letters (hi_Latn). The CLDR data for this increasingly popular locale has been significantly revised and expanded. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”.
ICU 71 and CLDR 41 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15 which is planned for September.) We are also working to re-establish continuous performance testing for ICU, and on development towards future versions.
ICU 71 updates to the time zone data version 2022a. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
For details, please see https://icu.unicode.org/download/71.
Wednesday, April 6, 2022
Unicode CLDR Version 41 Released!
[beta image] The Unicode CLDR
Version 41 has been released, and has already been integrated into
ICU.
CLDR v41 is a limited-submission release. Most work was on tooling, with only specified updates to the data, namely Phase 3 of the grammatical units of measurement project. The required grammar data for the Modern coverage level increased, with 40 locales adding an average of 4% new data each. Ukrainian grew the most, by 15.6%. The tooling changes are targeted at the v42 general submission release. They include a number of features and improvements such as progress meter widgets in the Survey Tool.
Finally, the Basic level has been modified to make it easier to onboard new languages, and easier for implementations to filter locale data based on coverage levels.
The following table shows the number of Languages/Locales in this version. (See the v41 Locale Coverage table for more information.)
Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:
The next version of CLDR, version 42, is slated to start General Submission on May 18, 2022.
Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
CLDR v41 is a limited-submission release. Most work was on tooling, with only specified updates to the data, namely Phase 3 of the grammatical units of measurement project. The required grammar data for the Modern coverage level increased, with 40 locales adding an average of 4% new data each. Ukrainian grew the most, by 15.6%. The tooling changes are targeted at the v42 general submission release. They include a number of features and improvements such as progress meter widgets in the Survey Tool.
Finally, the Basic level has been modified to make it easier to onboard new languages, and easier for implementations to filter locale data based on coverage levels.
The following table shows the number of Languages/Locales in this version. (See the v41 Locale Coverage table for more information.)
Level
Languages
Locales
Notes
Modern
89
361
Suitable for full UI internationalization
Moderate
13
32
Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Basic
22
21
Suitable for locale selection, such as choice of language in mobile phone settings.
Total
124
414
Total of all languages/locales with ≥ Basic coverage.
Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:
- Modern: Cherokee, Cantonese, Scottish Gaelic, Sorbian (Lower), Sorbian (Upper)
- Moderate: Asturian [nearly Modern], Breton, Faroese, Fulah (Adlam), Kaingang, Nheengatu, Quechua, Sardinian
- Basic: Bosnian (Cyrillic), Interlingua, Kabuverdianu, MΔori, Romansh, Tajik, Tatar, Tongan, Uzbek (Cyrillic), Wolof
The next version of CLDR, version 42, is slated to start General Submission on May 18, 2022.
Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
Monday, April 4, 2022
Emoji Are Not Born, They Are Made
Unicode now accepting proposals for Emoji 16.0
It’s hard to believe that just as Emoji 14.0 begins to appear on your device of choice this year, the Unicode Emoji Subcommittee [ESC] has already begun to plan for Emoji 16.0. That’s right, as of today — April 4, 2022 — applications to submit ideas for new emoji are open through July 31, 2022! π️ππ️So, how do you ensure your proposal is the best it can be? Well, here are some tips for consideration as you prepare it.
Check whether the emoji already exists!
✅ First: See if it’s already been approved.π€ Second, is it being reviewed?
π§πΎπ« Tip: Don’t skip any of the fields in the form! Incomplete proposals won’t be processed and will be returned. The ESC team members get a lot of submissions and complete proposals help them evaluate the submissions.
Be sure your proposal meets the criteria for consideration.
We recommend being faithful to the criteria for inclusion as much as possible and to consult the Emoji Subcommittee’s priorities, guidelines, strategies, reports, and audits. Many of the new provisional candidates for Emoji 15.0 are the result of these documents: pink heart, shaking face, rightwards pushing hand. The following are just some of the many considerations for writing a compelling proposal:- Multiple Uses
Does the candidate emoji have significant metaphorical references or symbolism and not merely represent itself? - Use in sequences
How is the emoji used with other emoji to communicate something new? - Breaking new ground
Does the emoji represent something that is not already representable? - Distinctiveness
Explain how and why this emoji represents a distinct, visually iconic entity that is relevant to a global audience - Compatibility
Is it needed for compatibility with frequently-used emoji in popular existing systems, such as WeChat, Twitter, etc. - Frequency of Use
Is there a high frequency of use? There should be empirical evidence of high usage in literature, movies, graphic novels, etc. worldwide.
Well, let’s get going! How do I propose an emoji?
π Submit a proposalMy proposal wasn’t selected :(
We recognize that it will come as a disappointment if your proposal is not one of the few selected for inclusion. π There are loads of reasons why this may have happened.- ➕ It can already be represented by a sequence
(Ex. Garbage fire π️π₯, Can of worms π₯«πͺ±) - π It’s too specific
We can’t add every type of flower, every breed of dog, every color of drink - π° Very few are selected
Roughly thirty emoji characters are added each year - π£ It’s a transient concept
Think less “memes” and more “stable long-standing concepts”. Can you cite how this concept has existed in a communicative manner such as literature, movies, graphic novels, etc.? - ♾️ It’s open-ended
There is no compelling evidence to add it over others of a similar type - ❌ Many other factors for exclusion
Why can’t we make EVERYTHING an emoji?
Any emoji additions have to take into consideration usage frequency, trade-offs with other choices, font file size, and the burden on developers (and users!) to make it easier to send and receive emoji. That’s why the Emoji Subcommittee set out to reduce the number of emoji we encode in any given year.Reconciling the rapid, transient nature of modern communication with the formal, methodical process required by a standards body like the Unicode Consortium is the name of the game these days. Until the sending and receiving of images is standardized in some manner so you can send any image in the world alongside your text messages not just code points ... well, Unicode is here for the world’s emoji character needs. π«π
Subscribe to:
Comments (Atom)