The Unicode Blog

Friday, March 15, 2013

CLDR Version 23 Released

Unicode CLDR 23 has been released, providing an update to the key building blocks for software supporting the world's languages.

Unicode CLDR 23.0 contains data for 215 languages and 227 territories—654 locales in all. This release focused primarily on improvements to the LDML structure and tools, and on consistency of data. It includes substantially improved support for non-Gregorian calendars (such as the Japanese Imperial calendar used extensively in Japan). The data and structure has also been modified to easily permit changing between 12 and 24 hour formats, and between 2 digit and 4 digit years. The new Unicode character is used for the Turkish Lira, and information is provided for currencies that round to 5 cents (or other subunits) in cash transactions. For most languages that use non-Latin scripts, characters in the language’s script now collate before those in other scripts (including A-Z). Language-specific letter-casing changes (Lower, Upper, Title) have been added for Azerbaijani, Greek, Lithuanian, and Turkish. Keyboard data has also been updated for Android. Also, as of this release, the LDML specification is split into multiple parts, each focusing on a particular area.

The release had a short cycle so that we could move to the new regular semi-annual schedule. It thus only included a limited data submission phase, for 4 languages only: Armenian (hy), Georgian (ka), Mongolian (mn), and Welsh (cy). For those languages, the data increased by over 100%.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Posted by Unicode, Inc. at 3:06 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: calendar, CLDR, CLDR 23, LDML

Tuesday, March 12, 2013

Unicode 6.3 Beta Review

The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.3.0. All beta feedback must be submitted by April 29, 2013.

The main feature of Unicode 6.3 is the update of the Unicode Bidirectional Algorithm and five newly-encoded bidirectional format control characters: U+061C ARABIC LETTER MARK and the isolate span controls U+2066..U+2069. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.3.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.3.0 in June, 2013.

• See http://www.unicode.org/versions/beta-6.3.0.html, for information about testing the 6.3.0 beta.
• See http://www.unicode.org/versions/Unicode6.3.0/ for the current draft summary of Unicode 6.3.0.

Posted by Unicode, Inc. at 12:02 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: beta 6.3 bidi, UAX #9, Unicode

Wednesday, March 6, 2013

In Memoriam page for Unicode contributors

Unicode is a project that has been built by hundreds of people over many decades. Some people involved in this project are no longer with us, and we wish to remember their contributions: http://www.unicode.org/consortium/memoriam.html

Posted by Unicode, Inc. at 1:23 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Tuesday, March 5, 2013

Specifying Optional Conjuncts in Malayalam

The UTC has posted a new Public Review Issue regarding a proposal to specify optional conjuncts in Malayalam.

In Malayalam there are two prevailing orthographies, traditional and reformed. Both are written using the same Malayalam character set. The difference between them is typically manifested only by the font. Traditional orthography accommodates more full conjuncts, while the reformed orthography would use visible virama (Chandrakkala) separated sequences for many of those full conjuncts.

This proposal specifies the further use of ZWJ and ZWNJ in sequences in the Malayalam script to indicate preferences for optional display of conjuncts. Such sequences are intended to indicate the preferences, both for rendering systems that support the reformed Malayalam orthography and for systems that support the traditional Malayalam orthography.

The UTC is seeking feedback on this proposal, regarding its advisability and potential impacts on implementations, as well as any suggestions for alternative approaches to the issues raised in the background document.

Posted by Unicode, Inc. at 4:10 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Friday, March 1, 2013

New FAQ on Private-use Characters, Noncharacters and Sentinels

A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.

Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.

Posted by Unicode, Inc. at 12:46 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Friday, February 22, 2013

Be a Part of IUC 37! Call for Participation

SUBMISSION DEADLINE: Friday, March 29th

| Submit Abstract Form |

Do you have knowledge or experience with creating global software that will benefit others? Join other experts and industry leaders and present your ideas at The Thirty-seventh Internationalization & Unicode® Conference (IUC 37), taking place in Santa Clara, Calif., USA; October 21-23, 2013. This is the premier conference on technologies and practices for the creation and management of global and multilingual software solutions.

The Unicode Consortium hosts this event annually, and the conference is recognized for its excellent technical content, industry-tested recommendations and updates on the latest standards. Topics from previous conferences can be found on the IUC 37 website.

Submit your proposals for presentations or tutorials regarding case studies, best practices, innovative technology, or evolving standards. Suitable topics include, but are not limited to:

Application Areas

• Designing software platforms, operating systems, software as a service (SAAS), or programming environments

• Social networks

• Search engines, SEO, discovery and navigation best practices

• Websites and web services

• Libraries and education

• Mobile applications including iPhone, Android, iPad, Kindle, Windows Mobile, tablets, etc.

• Game, Cable Boxes, and other platforms

• Publishing and broadcasting for a global audience

• Security concerns and practices

• Voice to text, text to voice

• Machine translation

General Techniques

• Advances in technologies, algorithms or methodologies

• Using internationalization libraries and programming environments

• Handling bidirectional or other complex scripts

• Locales and the Unicode Common Locale Data Repository (CLDR)

• Font development and Typography

Managing Global Software Development and Geographically Distributed Teams

• Project management and methodologies e.g. Agile

• Best practices in localization process and technology

• Best practices in world-ready development, testing, and deployment

• Improving globalization capabilities within organizations

• Approaches for migrating legacy applications to global markets

Evolving Standards and Related Practices

• Endangered or Unencoded Languages

• Case studies and research on cross-culture communication

• Internationalized Domain Names and other identifiers

• Languages of Africa, Asia, and the Middle East

• ISO language tag topics

• HTML5, CSS3, and modern browser topics

• Dealing with data formats: XML, JSON, HTML5, DITA, and upcoming standards

• Unicode, encodings, scripts, character properties, and algorithms

• Emoji support

Tutorial presenters receive complimentary conference registration, and two nights lodging. Session presenters receive a fifty percent conference discount and two nights lodging.

To be considered as a presenter for the conference, please submit a brief abstract by the deadline of Friday, March 29th.

The Program Committee will notify authors by Friday, May 3rd. Final presentation materials will be required from selected presenters by Friday, July 20th.

Posted by Unicode, Inc. at 10:31 AM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: IUC, IUC 37, participation

Wednesday, February 20, 2013

Corrigendum #9 clarifies noncharacter usage in Unicode

There has been confusion about whether noncharacters were permitted in Unicode text. The new Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permissible even in open interchange, although their intended semantics may not be interpretable in such contexts. The UTF-8, UTF-16, UTF-32 & BOM FAQ has also been updated for clarity, and other informative text about noncharacters will be revised over time, including the Core Specification.

Background. There are 66 noncharacters permanently reserved for internal use, typically used for some sort of internally-defined control function or sentinel value. They should be supported by APIs, components, and applications that handle (i.e., either process or pass through) all Unicode strings, such as a text editor or string class. Where an application does make internal use of a noncharacter, it should take some measures to sanitize input text from unknown sources. The best practice is to replace that particular noncharacter on input by U+FFFD. (The noncharacter should not be simply deleted, since that can cause security problems. For more information, see Section 3.5 Deletion of Code Points in UTR #36, Unicode Security Guidelines.)

Posted by Unicode, Inc. at 12:44 PM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: corrigendum, noncharacters

Tuesday, February 12, 2013

IUC 37: Save The Date - Oct 21-23, 2013

The Internationalization and Unicode Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps. This year's conference will also highlight new features in Unicode Version 6.1 and other relevant standards published this year.
Reasons to Attend Include:

Tutorials and sessions for beginners, to train you and your staff on basic practices and implementation techniques for creating international software
Learn recommended solutions to difficult problems or sophisticated requirements from industry leaders and experts in attendance
Find help from tool and product vendors to get you to market quickly and cost-effectively

Posted by Unicode, Inc. at 10:12 AM

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: conference, IUC, IUC 37, tutorial

Friday, March 15, 2013

CLDR Version 23 Released

About the Unicode Consortium

Tuesday, March 12, 2013

Unicode 6.3 Beta Review

Wednesday, March 6, 2013

In Memoriam page for Unicode contributors

Tuesday, March 5, 2013

Specifying Optional Conjuncts in Malayalam

Friday, March 1, 2013

New FAQ on Private-use Characters, Noncharacters and Sentinels

Friday, February 22, 2013

Be a Part of IUC 37! Call for Participation

SUBMISSION DEADLINE: Friday, March 29th

Application Areas

General Techniques

Managing Global Software Development and Geographically Distributed Teams

Evolving Standards and Related Practices

Wednesday, February 20, 2013

Corrigendum #9 clarifies noncharacter usage in Unicode

Tuesday, February 12, 2013

IUC 37: Save The Date - Oct 21-23, 2013

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog