perlcn - 1⁄4òÌåÖÐÎÄ Perl Ö ̧ÄÏ
»¶ÓÀ ́μ1⁄2 Perl μÄÌìμØ!
́Ó 5.8.0 °æ¿aÊ1⁄4, Perl 3⁄4ß± ̧ÁËÍêÉÆμÄ Unicode (Í3Ò»Âë) Ö§Ô®, Ò2Á¬ ́øÖ§Ô®ÁËÐí¶àÀ¶¡ÓïÏμÒÔÍâμıàÂë·1⁄2Ê1⁄2; CJK (ÖÐÈÕo«) ±ãÊÇÆäÖÐμÄÒ»2¿·Ý. Unicode ×ばつ1⁄4, ÊÔÍ1⁄4o ×ばつÖ·û: Î÷·1⁄2ÊÀ1⁄2ç, ¶«·1⁄2ÊÀ1⁄2ç, ÒÔ1⁄4°Á1⁄2Õß1⁄4äμÄÒ»ÇÐ (Ï£À°ÎÄ, ÐðÀûÑÇÎÄ, ÑÇÀ2®ÎÄ, Ï2ドル®À ́ÎÄ, Ó¡¶ÈÎÄ, Ó¡μØ°2ÎÄ, μÈμÈ). ×ばつ÷ÒμÏμÍ3ÓëÆ1⁄2Ì ̈ (Èç PC 1⁄4°Âó1⁄2ðËþ).
Perl ±3⁄4ÉíÒÔ Unicode ×ばつ÷. Õâ±íÊ3⁄4 Perl ×ばつÖ·û ́®Êý3⁄4Ý¿ÉÓà Unicode ±íÊ3⁄4; Perl μÄo ̄Ê1⁄2ÓëËã·û (ÀýÈçÕý1æ±íÊ3⁄4Ê1⁄2±È¶Ô) Ò2ÄÜ¶Ô Unicode ×ばつ÷. ÔÚÊäÈë1⁄4°Êä3öʱ, ÎaÁË ́¦ÀíÒÔ Unicode ֮ǰμıàÂë·1⁄2Ê1⁄2 ́æ·ÅμÄÊý3⁄4Ý, Perl Ìá1©ÁË Encode Õâ ̧öÄ£¿é, ×ばつμØ¶ÁÈ¡1⁄4°Ð ́Èë3⁄4ÉÓÐμıàÂëÊý3⁄4Ý.
Encode ÑÓÉìÄ£¿éÖ§Ô®ÏÂÁÐ1⁄4òÌåÖÐÎÄμıàÂë·1⁄2Ê1⁄2 ('gb2312' ±íÊ3⁄4 'euc-cn'):
euc-cn Unix ×ばつÖ·û1⁄4 ̄, ×ばつ3ÆμÄ1ú±êÂë
gb2312-raw Î ́3⁄4 ́¦ÀíμÄ (μͱÈÌØ) GB2312 ×ばつÖ·û±í
gb12345 Î ́3⁄4 ́¦ÀíμÄÖÐ1úÓ÷±ÌåÖÐÎıàÂë
iso-ir-165 GB2312 + GB6345 + GB8565 + ×ばつÖ·û
cp×ばつÖÂëÒ3 936, Ò2¿ÉÒÔÓà 'GBK' (À©3ä1ú±êÂë) Ö ̧Ã÷
hz 7 ±ÈÌØÒÝ3öÊ1⁄2 GB2312 ±àÂë
3⁄4ÙÀýÀ ́Ëμ, 1⁄2« EUC-CN ±àÂëμÄμμ° ×ばつa3É Unicode, ìóÐè1⁄4üÈëÏÂÁÐÖ ̧Áî:
perl -Mencoding=euc-cn,STDOUT,utf8 -pe1 < file.euc-cn > file.utf8
Perl Ò2ÄÚ ̧1⁄2ÁË "piconv", Ò»Ö§ÍêÈ«ÒÔ Perl Ð ×ばつa»»1¤3⁄4ß3ÌÐò, Ó÷ ̈ÈçÏÂ:
piconv -f euc-cn -t utf8 < file.euc-cn > file.utf8
piconv -f utf8 -t euc-cn < file.utf8 > file.euc-cn
ÁíÍâ, ÀûÓà encoding Ä£¿é, ×ばつÐ ×ばつÖ·ûÎaμ\λμÄ3ÌÐòÂë, ÈçÏÂËùÊ3⁄4:
#!/usr/bin/env perl
# Æô¶ ̄ euc-cn ×ばつÖ ́®1⁄2âÎö; ×ばつ1⁄4 ́íÎó¶1⁄4ÉèÎa euc-cn ±àÂë
use encoding 'euc-cn', STDIN => 'euc-cn', STDOUT => 'euc-cn';
print length("ÂæÍÕ"); # 2 (×ばつÖ·û)
print length('ÂæÍÕ'); # 4 (×ばつÖ1⁄2Ú)
print index(×ばつ»1⁄2Ì»å", ×ばつ»1⁄2"); # -1 (2»°üo¬ ×ばつÖ·û ́®)
print index(×ばつ»1⁄2Ì»å', ×ばつ»1⁄2'); # 1 ( ́ÓμÚ¶þ ×ばつÖ1⁄2Ú¿aÊ1⁄4)
×ばつÓÀï, ×ばつ»" μÄμÚ¶þ ×ばつÖ1⁄2ÚÓë ×ばつ»" μÄμÚÒ» ×ばつÖ1⁄2Ú1⁄2áoÏ3É EUC-CN ÂëμÄ ×ばつ"; ×ばつ»" μÄμÚ¶þ ×ばつÖ1⁄2ÚÔòÓë "1⁄2Ì" μÄμÚÒ» ×ばつÖ1⁄2Ú1⁄2áoÏ3É "»1⁄2". Õâ1⁄2â3⁄4öÁËÒÔǰ EUC-CN Âë±È¶Ô ́¦ÀíÉÏ31ドル⁄4ûμÄÎÊÌâ.
Èç1ûÐèÒa ̧ü¶àμÄÖÐÎıàÂë, ¿ÉÒÔ ́Ó CPAN (http://www.cpan.org/) ÏÂÔØ Encode::HanExtra Ä£¿é. ËüĿǰÌá1©ÏÂÁбàÂë·1⁄2Ê1⁄2:
gb18030 À©3ä1ýμÄ1ú±êÂë, °üo¬·±ÌåÖÐÎÄ
ÁíÍâ, Encode::HanConvert ×ばつa»»ÓÃμÄÁ1⁄2ÖÖ±àÂë:
big5-simp Big5 ·±ÌåÖÐÎÄÓë Unicode ×ばつa
gbk-trad GBK 1⁄4òÌåÖÐÎÄÓë Unicode ×ばつa
ÈôÏëÔÚ GBK Óë Big5 ×ばつa, Çë2ο1⁄4 ̧ÃÄ£¿éÄÚ ̧1⁄2μÄ b2g.pl Óë g2b.pl Á1⁄2Ö§3ÌÐò, »òÔÚ3ÌÐòÄÚÊ1ÓÃÏÂÁÐÐ ́· ̈:
use Encode::HanConvert;
$euc_cn = big5_to_gb($big5); # ́Ó Big5 ×ばつaÎa GBK
$big5 = gb_to_big5($euc_cn); # ́Ó GBK ×ばつaÎa Big5
Çë2ο1⁄4 Perl ÄÚ ̧1⁄2μÄ ́óÁ¿ËμÃ÷ÎÄ1⁄4þ (2»ÐÒÈ«ÊÇÓÃÓ¢ÎÄÐ ́μÄ), À ́ѧϰ ̧ü¶à1ØÓÚ Perl μÄÖaʶ, ÒÔ1⁄4° Unicode μÄÊ1Ó÷1⁄2Ê1⁄2. 2»1ý, ×ばつÊÔ ́Ïàμ±·á ̧»:
Perl ×ばつÒ3 (ÓÉÅ·À3Àñ1«Ë3⁄4ά»¤)
Perl ×ばつÛoÏμä2ØÍø (Comprehensive Perl Archive Network)
Perl ÓÊμÝÂÛÌ3Ò»ÀÀ
1⁄4òÌåÖÐÎİæμÄÅ·À3Àñ Perl Êé1⁄2å
ÖÐ1ú Perl ×ばつéÒ»ÀÀ
Unicode ѧÊõѧ»á (Unicode ×ばつ1⁄4μÄÖÆ¶ ̈Õß)
Unix/Linux ÉÏμÄ UTF-8 1⁄4° Unicode ́ð¿ÍÎÊ
Encode, Encode::CN, encoding, perluniintro, perlunicode
Jarkko Hietaniemi <jhi@iki.fi>
Autrijus Tang (×ばつÚoo) <autrijus@autrijus.org>
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in '1⁄4òÌåÖÐÎÄ'. Assuming CP1252