Friday, October 28, 2022
MessageFormat 2 Technical Preview Available
[image] The
MessageFormat Working Group is pleased to announce that it has released a
Technical Preview implementation of the current state of the MessageFormat 2
specification in ICU4J in the recent ICU 72 release. The Working Group
has been working on a specification for a successor to ICU MessageFormat, which
has been an industry staple for internationalized software for more than two
decades.
Owing to the prevalence of MessageFormat not just as an API for software, but also given its syntax serving as a de facto serialization format for the localized messages sent to the API, the Working Group has put careful consideration into interchange and interoperability. Some goals of the new specification include promoting best practices for internationalization, including compatibility with localization industry supported XLIFF. Another goal includes a definition of the data model of the API input separate from the syntax to allow for multiple compliant syntaxes to be compatible. Similarly, the specification includes a registry of interfaces for dependent formatting functions, in order to cleanly separate implementation from specification, allowing users to specify custom formatting functions and plug in their own implementations.
MessageFormat 2 builds on top of the experience from using and maintaining ICU MessageFormat and a number of other localization systems and workflows. It improves the placeholder syntax, improves escaping rules inside the translatable content, replaces nested selectors with top-level multiple selectors, and allows the use of custom formatters.
For example:
More examples and the formal definition of the grammar can be found in the specification draft.
We invite you all to try the Tech Preview available now in ICU4J and provide us any and all feedback. Similarly, please note that the MessageFormat 2 is still a work in progress, and therefore we rely on your questions, suggestions, and issues to critically inform how we iterate on the specification. Real world experience will allow us to address potential shortcomings in the ways that MessageFormat 2 will get used in practice.
For information about using the Tech Preview, refer to the API docs, ICU 72 download page, and the ICU4J User Guide.
To leave feedback about MessageFormat 2 (specification, syntax, etc.) or the Tech Preview implementation, please visit the working group’s repository at github.com/unicode-org/message-format-wg, where you can open a new Discussion topic or file a new Issue.
Owing to the prevalence of MessageFormat not just as an API for software, but also given its syntax serving as a de facto serialization format for the localized messages sent to the API, the Working Group has put careful consideration into interchange and interoperability. Some goals of the new specification include promoting best practices for internationalization, including compatibility with localization industry supported XLIFF. Another goal includes a definition of the data model of the API input separate from the syntax to allow for multiple compliant syntaxes to be compatible. Similarly, the specification includes a registry of interfaces for dependent formatting functions, in order to cleanly separate implementation from specification, allowing users to specify custom formatting functions and plug in their own implementations.
MessageFormat 2 builds on top of the experience from using and maintaining ICU MessageFormat and a number of other localization systems and workflows. It improves the placeholder syntax, improves escaping rules inside the translatable content, replaces nested selectors with top-level multiple selectors, and allows the use of custom formatters.
For example:
match {$photoCount :number} {$userGender :equals}
when 1 masculine {{$userName} added a new photo to his album.}
when 1 feminine {{$userName} added a new photo to her album.}
when 1 * {{$userName} added a new photo to their album.}
when * masculine {{$userName} added {$photoCount} photos to his album.}
when * feminine {{$userName} added {$photoCount} photos to her album.}
when * * {{$userName} added {$photoCount} photos to their album.}
More examples and the formal definition of the grammar can be found in the specification draft.
We invite you all to try the Tech Preview available now in ICU4J and provide us any and all feedback. Similarly, please note that the MessageFormat 2 is still a work in progress, and therefore we rely on your questions, suggestions, and issues to critically inform how we iterate on the specification. Real world experience will allow us to address potential shortcomings in the ways that MessageFormat 2 will get used in practice.
For information about using the Tech Preview, refer to the API docs, ICU 72 download page, and the ICU4J User Guide.
To leave feedback about MessageFormat 2 (specification, syntax, etc.) or the Tech Preview implementation, please visit the working group’s repository at github.com/unicode-org/message-format-wg, where you can open a new Discussion topic or file a new Issue.
Friday, October 21, 2022
ICU 72 Released
ICU LogoUnicode® ICU 72 has just been released. ICU is the
premier library for
software internationalization, used by a
wide array of companies and organizations to support the
world's languages, implementing both the latest version of the Unicode Standard
and of the Unicode locale data (CLDR). ICU 72 updates to
Unicode 15 , and to
CLDR 42 locale data with various additions and
corrections.
ICU 72 and CLDR 42 are major releases, including a new version of Unicode and major locale data improvements.
ICU 72 adds two technology preview implementations based on draft Unicode specifications:
For details, please see https://icu.unicode.org/download/72.
ICU 72 and CLDR 42 are major releases, including a new version of Unicode and major locale data improvements.
ICU 72 adds two technology preview implementations based on draft Unicode specifications:
- Formatting of people’s names in multiple languages (CLDR background on why this feature is being added and what it does)
- An enhanced version of message formatting
For details, please see https://icu.unicode.org/download/72.
Thursday, October 20, 2022
Unicode CLDR v42 available
[image] Unicode CLDR version 42 is now available and has been integrated into version 72 of ICU. In CLDR 42, the focus was on:
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
There are many other changes: to find out more, see the draft CLDR v42 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.
In version 42, the following levels were reached:
- Locale coverage. The following locales now have higher coverage levels:
- Modern: Igbo (ig), Yoruba, (yo)
- Moderate: Chuvash (cv), Xhosa (xh)
- Basic: Bhojpuri (bho), Haryanvi (bgc), Rajasthani (raj), Tigrinya (ti)
- Formatting Person Names. Added data and structure for formatting people’s names. For more information on why this feature is being added and what it does, see Background.
- Emoji 15.0 Support. Added short names, keywords, and sort-order for the new Unicode 15.0 emoji.
- Coverage, Phase 2. Added additional language names and other items to the Modern coverage level for more consistency (and utility) across platforms.
- Unicode 15.0 additions. Made the regular additions and changes for the new release of Unicode, including names for new scripts, collation data for Han characters, etc.
Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.
There are many other changes: to find out more, see the draft CLDR v42 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.
In version 42, the following levels were reached:
Level
Languages
Locales*
Notes
Modern
95
369
Suitable for full UI internationalization
Afrikaans, … Čeština, … Dansk, … Eesti, … Filipino, … Gaeilge, … Hrvatski, Indonesia, … Jawa, Kiswahili, Latviešu, … Magyar, …Nederlands, … O‘zbek, Polski, … Română, Slovenčina, … Tiếng Việt, … Ελληνικά, Беларуская, … ᏣᎳᎩ, Ქართული, Հայերեն, עברית, اردو, … አማርኛ, नेपाली, … অসমীয়া, বাংলা, ਪੰਜਾਬੀ, ગુજરાતી, ଓଡ଼ିଆ, தமிழ், తెలుగు, ಕನ್ನಡ, മലയാളം, සිංහල, ไทย, ລາວ, မြန်မာ, ខ្មែរ, 한국어, … 日本語, …
Moderate
6
11
Suitable for full “document content” internationalization, such as formats in a spreadsheet.
6
11
Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … Èdè Yorùbá, Føroyskt, Igbo, IsiZulu,
Kanhgág, Nheẽgatu, Runasimi, Sardu, Shqip, سنڌي, …
Basic
29
43
Suitable for locale selection, such as choice of language in mobile phone settings.
29
43
Suitable for locale selection, such as choice of language in mobile phone settings.
Asturianu, Basa Sunda, Interlingua, Kabuverdianu, Lea Fakatonga, Rumantsch, Te reo Māori, Wolof, Босански (Ћирилица), Татар, Тоҷикӣ, Ўзбекча (Кирил), کٲشُر, कॉशुर (देवनागरी), …, মৈতৈলোন্, ᱥᱟᱱᱛᱟᱲᱤ, 粤语 (简体)
* Locales are variants for different countries or scripts.Thursday, October 6, 2022
ICU 72 Release Candidate Available
ICU LogoWe are pleased to announce the release candidate for Unicode® ICU 72. It updates to Unicode 15, and to
CLDR 42 locale data with various additions and corrections.
ICU 72 adds technology preview implementations for person name formatting, as well as for a new version of message formatting based on a proposed draft Unicode specification.
ICU 72 and CLDR 42 are major releases, including a new version of Unicode and major locale data improvements.
ICU 72 updates to the time zone data version 2022b (2022-Aug) which is effectively the same as 2022c. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
For details, please see https://icu.unicode.org/download/72.
Please test this release candidate on your platforms and report bugs and regressions by Tuesday, 2022-Oct-18, via the icu-support mailing list, and/or please find/submit error reports.
Please do not use this release candidate in production.
The preliminary API reference documents are published on unicode-org.github.io/icu-docs/ – follow the “Dev” links there.
ICU 72 adds technology preview implementations for person name formatting, as well as for a new version of message formatting based on a proposed draft Unicode specification.
ICU 72 and CLDR 42 are major releases, including a new version of Unicode and major locale data improvements.
ICU 72 updates to the time zone data version 2022b (2022-Aug) which is effectively the same as 2022c. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.
For details, please see https://icu.unicode.org/download/72.
Please test this release candidate on your platforms and report bugs and regressions by Tuesday, 2022-Oct-18, via the icu-support mailing list, and/or please find/submit error reports.
Please do not use this release candidate in production.
The preliminary API reference documents are published on unicode-org.github.io/icu-docs/ – follow the “Dev” links there.
Subscribe to:
Comments (Atom)