Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

philuo/icu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

501 Commits

Repository files navigation

Name: icu
URL: https://github.com/unicode-org/icu
Version: 74-2
CPEPrefix: cpe:/a:icu-project:international_components_for_unicode:74.2
License: MIT
License File: LICENSE
Security Critical: yes
Shipped: yes
Description:
This directory contains the source code of ICU 74.2 for C/C++.
A. How to update ICU
1. Run "scripts/update.sh <version>" (e.g. 74-2).
 This will download ICU from the upstream git repository.
 It does preserve Chrome-specific build files and
 converter files. (see section C)
 source.gni and icu.gyp* files are automatically updated, too.
2. Review and apply patches/changes in "D. Local Modifications" if
 necessary/applicable. Update patch files in patches/.
3. Follow the instructions in section B on building ICU data files
B. How to build ICU data files
Pre-built data files are generated and checked in with the following steps
1. icu data files for Chrome OS, Linux, Mac and Windows
 a. Make a icu data build directory outside the Chromium source tree
 and cd to that directory (say, $ICUBUILDIR).
 b. Run
 ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh
 This script takes the following steps:
 i) Run
 ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests
 ii) Run make
 iii) (cd data && make clean)
 iv) scripts/config_data.sh common
 This configure the build with filer for common.
 v) Run make
 vi) scripts/copy_data.sh common
 This copies the ICU data files for non-Android platforms
 (both Little and Big Endian) to the following locations:
 common/icudtl.dat
 common/icudtb.dat
 vii) Repeat step iii) - vi) for chromeos to produce chromeos/icudtl.dat
 viii) cast/patch_locale.sh
 Modify the file for cast, android, ios and flutter.
 ix) Repeat step iii) - vi) for cast, android and ios to produce
 cast/icudtl.dat
 android/icudtl.dat
 ios/icudtl.dat
 x) flutter/patch_brkitr.sh
 On top of cast/patch_locale.sh.sh (step viii)), further patch
 the code for flutter.
 xi) Repeat step iii) - vi) for flutter to produce
 flutter/icudtl.dat
 xii) scripts/clean_up_data_source.sh
 This reverts the result of cast/patch_locale.sh and flutter/patch_brkitr.sh
 make the tree ready for committing updated ICU data files for
 non-Android and Android platforms.
 c. Whenever data is updated (e.g timezone update), take step b as long
 as the ICU build directory used in a. is kept.
2. Note on the locale data customization
 - filter/chromeos.json
 a. Filter the locale data for ChromeOS's UI langauges :
 locales, lang, region, currency, zone
 b. Filter the locale data for non-UI languages to the bare minimum :
 ExemplarCharacters, LocaleScript, layout, and the name of the
 language for a locale in its native language.
 c. Filter the legacy Chinese character set-based collation
 (big5han/gb2312han) that don't make any sense and nobdoy uses.
 - filter/common.json
 Same as above in filter/chromeos.json, AND
 e. Filter exemplar cities in timezone data (data/zone).
 - filter/android.json and filter/ios.json
 a. Filter the locale data for Android / iOS UI langauges :
 locales, lang, region, currency, zone
 b. Filter the locale data for non-UI languages to the bare minimum :
 ExemplarCharacters, LocaleScript, layout, and the name of the
 language for a locale in its native language.
 c. Filter the legacy Chinese character set-based collation
 d. Filter source/data/{region,lang} to exclude these data
 except the language and script names of zh_Hans and zh_Hant.
 e. Keep only the minimal calendar data in data/locales.
 f. Include currency display names for a smaller subset of currencies.
 g. Minimize the locale data for 9 locales to which Chrome on Android
 is not localized.
C. Chromium-specific data build files and converters
They're preserved in step A.1 above. In general, there's no need to touch
them when updating ICU.
1. source/data/mappings
 - convrtrs.txt : Lists encodings and aliases required by the WHATWG
 Encoding spec plus a few extra (see the file as to why).
 - ucmlocal.txt : to list only converters we need.
 - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
 Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
 They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
 - gb18030.ucm and windows-936.ucm
 gb_table.patch was applied for the following changes. No need
 to apply it again. The patch is kept for the record.
 a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
 the encoding spec (one-way mapping in toUnicode direction).
 b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
 from U+1E3F to \xA8\xBC (windows-936/GBK).
 See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
2. source/data/brkitr
 - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See
 https://unicode-org.atlassian.net/browse/ICU-9451
 - dictionaries/laodict.txt: Abridged Lao dictionary. We keep using the smaller
 old version from ICU69-1.
 - rules/word_ja.txt (used only on Android)
 Added for Japanese-specific word-breaking without the C+J dictionary.
 - rules/{root,zh,zh_Hant}.txt
 a. Use line_normal by default.
 b. Drop local patches we used to have for the following issues. They'll
 be dealt with in the upstream (Unicode/CLDR).
 http://unicode.org/cldr/trac/ticket/6557
 http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
3. Add {an,ku,tg,wa}.txt to source/data/{locale,lang}
 with the minimal locale data necessary for spellchecker and
 and language menus.
D. Local Modifications
1. Applied locale data patches from Google obtained by diff'ing
 the upstream copy and Google's internal copy for source/data
 - patches/locale_google.patch:
 * Google's internal ICU locale changes
 * Simpler region names for Hong Kong and Macau in all locales
 * Currency signs in ru and uk locales (do not include 'tr' locale changes)
 * AM/PM, midnight, noon formatting for a few Indian locales
 * Timezone name changes in Korean and Chinese locales
 * Default digit for Arabic locale is European digits.
 - patches/locale1.patch: Minor fixes for Korean
 - patches/name_5_langs.patch: add the native names of 5 languages not currently
 supported by CLDR/ICU. When updating the ICU to a new version,
 source/data/lang/{ay,dv,ilo,lus,ts}.txt have to be checked and if they are
 present with their display names populated, this patch has to be adjusted
 or discarded as necessary.
2. Breakiterator patches
 - patches/wordbrk.patch for word.txt, word_POSIX.txt, and word_fi_sv.txt
 a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
 FQDN labels can be split at '.'
 b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
 See http://unicode.org/cldr/trac/ticket/6555
 c. Restore pre-ICU 72 behavior of breaking at '@'. The new upstream behavior
 of not breaking at '@' interacted badly with the local change to break at
 '.' (D.2.a above): although not breaking at '@' is intended to not break
 within e-mail addresses, this is not possible with Chromium's
 break-at-'.' behavior.
 - patches/khmer-dictbe.patch
 Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
 https://unicode-org.atlassian.net/browse/ICU-9451
 - Add several common Chinese words that were dropped previously to
 source/data/cjdict/brkitr/cjdict.txt
 patch: patches/cjdict.patch
 upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888
3. Timezone data update
 Run scripts/update_tz.sh to grab the latest version of the
 following timezone data files and put them in source/data/misc
 metaZones.txt
 timezoneTypes.txt
 windowsZones.txt
 zoneinfo64.txt
 - Delete the line containing `Factory{"Etc/Unknown"}` from
 timezoneTypes.txt before regenerating data
 bug: https://crbug.com/381620359
 As of Mar 18, 2025, the latest version is 2025a
 and the above files are available at the ICU github repos.
4. Build-related changes
 - patches/configure.patch:
 * Remove a section of configure that will cause breakage while
 running runConfigureICU.
 - patches/data_symb.patch :
 Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
 the icu data file or icudt.dll
5. ISO-2022-JP encoding (fromUnicode) change per WHATWG encoding spec.
 - patches/iso2022jp.patch
 - upstream bug:
 https://unicode-org.atlassian.net/browse/ICU-20251
6. Enable tracing of file but not resource, only for Chromium
 to reduce performance impact/risk.
 - patches/restrace.patch
7. Patch Arabic date time pattern back to 67 value to avoid test
 breakage in
 third_party/blink/web_tests/fast/forms/datetimelocal/datetimelocal-appearance-l10n.html
 - patches/ardatepattern.patch
 - https://bugs.chromium.org/p/chromium/issues/detail?id=1139186
8. Remove explicit std::atomic<NumberRangeFormatterImpl*> template
 instantiation
 patches/atomic_template_instantiation.patch
 - The explicit instantiation was added to silence MSVC C4251 warnings:
 https://unicode-org.atlassian.net/browse/ICU-20157
 Small test cases show that it is generally an error to instantiate
 std::atomic<T*> with an incomplete type T with MSVC, clang, and GCC, so this
 instantiation never should have worked:
 https://gcc.godbolt.org/z/34xx8h
 At this time, it's not clear if this particular instantiation with
 NumberRangeFormatterImpl* was ever necessary for MSVC. Further testing with
 MSVC is required to upstream this patch.
 - https://unicode-org.atlassian.net/browse/ICU-21482
9. Patch source/common/uposixdefs.h so it compiles on Fuchsia on Macs.
 patches/fuchsia.patch
 - context bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1184527
10. Patch fix of Etc/Unknown being returned for
 Intl.DateTimeFormat().resolvedOptions().timeZone on macOS 14.
 patches/revert_realpath.patch
 - https://bugs.chromium.org/p/chromium/issues/detail?id=1473422
 - https://unicode-org.atlassian.net/browse/ICU-22541
11. Patch source/common/locid.cpp to resolve crashes in DecimalFormatting.
 patches/locid.patch
 - https://issues.chromium.org/issues/333453962
12. Patch source/common/messagepattern.ccp/.h to resolve stack overflow.
 patches/limit_message_pattern.patch
 - https://issues.chromium.org/343280855
 - https://unicode-org.atlassian.net/browse/ICU-22798
13. Patch source/common/ubidiwrt.cpp to resolve a bidi buffer overflow.
 patches/bidi_buffer_overflow.patch
 - https://issues.chromium.org/u/1/issues/345352978
 - https://unicode-org.atlassian.net/browse/ICU-22768
14. Patch source/tools/pkgdata/pkgdata.cpp to remove a format-overflow warning.
 patches/remove_format_overflow_warning.patch
 - https://unicode-org.atlassian.net/browse/ICU-21589
 - https://issues.chromium.org/u/1/issues/345352978
15. Patch source/i18n/japancal.cpp to fix an extended year int32 overflow.
 patches/ja_extended_year_overflow.patch
 - https://issues.chromium.org/u/1/issues/345352978
 - https://unicode-org.atlassian.net/browse/ICU-22730
16. Patch an icu:calendar integer overflow by propagating error.
 patch/propagate_error_avoid_overflow.patch
 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68895
 - https://issues.chromium.org/u/1/issues/345352978
 - https://unicode-org.atlassian.net/browse/ICU-22730
17. Patch UnicodeString to fix heap-buffer-overflow .
 patches/unicode_string.patch
 - https://unicode-org.atlassian.net/browse/ICU-23060
 - https://g-issues.chromium.org/issues/393989622
 - https://patch-diff.githubusercontent.com/raw/unicode-org/icu/pull/3416.diff
 - https://patch-diff.githubusercontent.com/raw/unicode-org/icu/pull/3420.diff
18. Patch unnecessary uses of the virtual keyword.
 patches/unnecessary_virtual_specifier.patch
 - https://issues.chromium.org/issues/403236787
 - https://patch-diff.githubusercontent.com/raw/unicode-org/icu/pull/3489.diff

About

No description, website, or topics provided.

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE
Unknown
license.html

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /