Newbie question about text encoding

Marko Rauhamaa marko at pacujo.net
Sat Mar 7 12:14:28 EST 2015


Chris Angelico <rosuav at gmail.com>:
> If you really REALLY can't use the bytes() type to work with something
> that is, yaknow, bytes, then you could use an alternative encoding
> that has a value for every byte. It's still not Unicode text, so it
> doesn't much matter which encoding you use. But it's much better to
> use the bytes type to work with bytes. It is not text, so don't treat
> it as text.

See:
 $ mkdir /tmp/xyz
 $ touch /tmp/xyz/$'\x80'
 $ python3
 Python 3.3.2 (default, Dec 4 2014, 12:49:00) 
 [GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.listdir('/tmp/xyz')
 ['\udc80']
 >>> open(os.listdir('/tmp/xyz')[0])
 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 FileNotFoundError: [Errno 2] No such file or directory: '\udc80'
File names encoded with Latin-X are quite commonplace even in UTF-8
locales.
Marko


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /