This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年06月02日 13:18 by javahaxxor, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (12) | |||
|---|---|---|---|
| msg162134 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月02日 13:18 | |
print(listentry) fails on folder name with swedish (latin1) characters Error: File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 33: character maps to <undefined> |
|||
| msg162135 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年06月02日 14:12 | |
A mac expert can confirm, but I think that just means that the default mac_roman encoding (which is made the default by the OS, if I understand correctly) can't handle that character. I believe it will work if you use utf-8. And no, I don't know how to do that, not be being a Mac person. |
|||
| msg162143 - (view) | Author: Hynek Schlawack (hynek) * (Python committer) | Date: 2012年06月02日 15:28 | |
'\u030a' can’t be latin1 as 0x030a = 778 which is waaay beyond 255. :) That's gonna be utf-8 and indeed that maps to " ̊".
My best guess is that your LC_CTYPE is set to Mac Roman. You can check it using "import os;os.environ.get('LC_CTYPE')".
Try running python as "LC_CTYPE=sv_SE.UTF-8 python3" and do a "print('\u030a')" to try if it helps.
Otherwise a more complete (but minimal) example demonstrating the problem would be helpful.
|
|||
| msg162150 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2012年06月02日 17:05 | |
mac_roman is an obsolete encoding from Mac OS 9 days; it is seldom seen on modern OS X systems. But it is often the fallback encoding set in ~/.CFUserTextEncoding if the LANG or a LC_* environment variable is not set (see, for example, http://superuser.com/questions/82123/mac-whats-cfusertextencoding-for). If you run a terminal session using Terminal.app, the LANG environment variable is usually set for you to an appropriate modern value, like 'en_US.UTF-8' in the US locale; this is controlled by a Terminal.app preference; other terminal apps like iTerm2 have something similar. But if you are using xterm with X11, xterm does not inject a LANG env variable. So, something like: python3.2 -c 'print("\u030a")' may fail running under xterm with UnicodeEncodeError but will print the expected character when run under Terminal.app. I avoid those kinds of issues by explicitly setting LANG in my shell profile. Let us know if that helps or, if not, how to reproduce your issue. |
|||
| msg162156 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月02日 17:41 | |
The char in question: 'å'. It is a folder with this character in the name. My encoding is UTF-8. Running print("\u030a") gives a blank line
U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
General Character Properties
In Unicode since: 1.1
Unicode category: Letter, Uppercase
Canonical decomposition: U+0041 LATIN CAPITAL LETTER A + U+030A COMBINING RING ABOVE
Various Useful Representations
UTF-8: 0xC3 0x85
UTF-16: 0x00C5
C octal escaped UTF-8: 303円205円
XML decimal entity: Å
Annotations and Cross References
See also:
• U+212B ANGSTROM SIGN
Equivalents:
• U+0041 LATIN CAPITAL LETTER A U+030A COMBINING RING ABOVE
The code:
def traverse (targetDir):
currentDir = targetDir
dirs = os.listdir(targetDir)
for entry in dirs:
if os.path.isdir(entry):
print("Traversing " + entry)
traverse(entry)
else:
print("Not dir: " + entry)
if os.path.isfile(entry):
print("Processing " + " " + currentDir + " " + entry)
else:
print("Not file: " + entry)
print("\n")
|
|||
| msg162158 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月02日 17:52 | |
The last post is the CAPITAL Å. The following is the small letter "å" U+00E5 LATIN SMALL LETTER A WITH RING ABOVE General Character Properties In Unicode since: 1.1 Unicode category: Letter, Lowercase Canonical decomposition: U+0061 LATIN SMALL LETTER A + U+030A COMBINING RING ABOVE Various Useful Representations UTF-8: 0xC3 0xA5 UTF-16: 0x00E5 C octal escaped UTF-8: 303円245円 XML decimal entity: å Annotations and Cross References Notes: • Danish, Norwegian, Swedish, Walloon Equivalents: • U+0061 LATIN SMALL LETTER A U+030A COMBINING RING ABOVE |
|||
| msg162164 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2012年06月02日 18:58 | |
The character in question is not the problem and the code snippet you provide looks fine. The problem is almost certainly that you are running the code in an execution environment where the LANG environment variable is either not set or is set to an encoding that doesn't support higher-order Unicode characters. The fallback 'mac_roman' is such an encoding. The default encodings used by the Python 3 interpreter are influenced by the value of these environment variables. So the questions are: how are you running your code and what are the values of the environment variables that your Python program inherits, and, by any chance, is your program using the 'locale' module, and if so, exactly what functions from it?
Please try adding the following in the environment you are seeing the problem:
import sys
print(sys.stdout)
import os
print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
import locale
print(locale.getlocale())
print('\u00e5')
print('\u0061\u030a')
If I paste the above into a Python3.2 interactive terminal session using the python.org 64-/32-bit Python 3.2.3, I see the following:
$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[('LANG', 'en_US.UTF-8')]
>>> import locale
>>> print(locale.getlocale())
('en_US', 'UTF-8')
>>> print('\u00e5')
å
>>> print('\u0061\u030a')
å
But, if I explicitly remove the LANG environment variable:
$ unset LANG
$ python3.2
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='US-ASCII'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[]
>>> import locale
>>> print(locale.getlocale())
(None, None)
>>> print('\u00e5')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 0: ordinal not in range(128)
>>> print('\u0061\u030a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u030a' in position 1: ordinal not in range(128)
>>>
|
|||
| msg162173 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月02日 20:34 | |
Output in console:
Python 3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.stdout)
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> import os
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LC')])
[('LC_CTYPE', 'UTF-8')]
>>> print([(k, os.environ[k]) for k in os.environ if k.startswith('LANG')])
[]
>>> import locale
>>> print(locale.getlocale())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 524, in getlocale
return _parse_localename(localename)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/locale.py", line 433, in _parse_localename
raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: UTF-8
>>> print('\u00e5')
å
>>> print('\u0061\u030a')
å
**********************
Output from Eclipse:
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='MacRoman'>
[]
[]
(None, None)
å
Traceback (most recent call last):
File "/Users/adyhasch/Documents/PythonWorkspace/PatternRenamer/src/prenamer.py", line 70, in <module>
print('\u0061\u030a')
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/mac_roman.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u030a' in position 1: character maps to <undefined>
************************************
I'm running PyDev ..
|
|||
| msg162174 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月02日 20:42 | |
my code runs fine in a console window, so it's some kind of configuration error. Sorry for wasting your time guys .. It would be nice to know why PyDev is not setting the right environment vars though ..
>>> traverse(".")
Processing ./.DS_Store
Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.DS_Store
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/.picasa.ini
Traversing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.DS_Store
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/.picasa.ini
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5467.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5468.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/2011-10-04--Sebastian_2år/DSC_5472.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5440.JPG
Processing ./2011-10-03--Sebi_o_costi_ny_frisyr/DSC_5441.JPG
Processing ./__init__.py
Processing ./DSC_5440.JPG
Processing ./DSC_5453.JPG
Processing ./prenamer.py
|
|||
| msg162177 - (view) | Author: Hynek Schlawack (hynek) * (Python committer) | Date: 2012年06月02日 21:30 | |
Glad we could help. I suspected it was running under "special circumstances". |
|||
| msg162178 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2012年06月02日 21:31 | |
I'm neither a PyDev nor an Eclipse user but there should be some way to set environment variables in it. Undoubtedly, Eclipse is launched as an app so a shell is not involved and shell profile files are not processed. However, the "Environment" section of this tutorial may help: http://pydev.org/manual_101_interpreter.html Try adding a definition for LANG or LC_CTYPE, as you prefer. And you should use a valid localized definition, like LANG=en_US.UTF-8 for US English UTF-8. The list of definitions is in Lib/locale.py. Good luck! |
|||
| msg162202 - (view) | Author: Adrian Bastholm (javahaxxor) | Date: 2012年06月03日 09:25 | |
Thanks a lot for the help, guys ! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:31 | admin | set | github: 59191 |
| 2012年06月03日 09:25:26 | javahaxxor | set | messages: + msg162202 |
| 2012年06月02日 21:31:11 | ned.deily | set | messages: + msg162178 |
| 2012年06月02日 21:30:07 | hynek | set | status: open -> closed resolution: not a bug messages: + msg162177 stage: resolved |
| 2012年06月02日 20:42:38 | javahaxxor | set | messages: + msg162174 |
| 2012年06月02日 20:34:36 | javahaxxor | set | messages: + msg162173 |
| 2012年06月02日 18:58:57 | ned.deily | set | messages: + msg162164 |
| 2012年06月02日 17:52:15 | javahaxxor | set | messages: + msg162158 |
| 2012年06月02日 17:41:42 | javahaxxor | set | messages: + msg162156 |
| 2012年06月02日 17:05:46 | ned.deily | set | messages: + msg162150 |
| 2012年06月02日 15:28:15 | hynek | set | messages: + msg162143 |
| 2012年06月02日 14:12:26 | r.david.murray | set | assignee: ronaldoussoren -> messages: + msg162135 nosy: + hynek, r.david.murray, ned.deily |
| 2012年06月02日 13:18:23 | javahaxxor | create | |