Message153921
| Author |
ezio.melotti |
| Recipients |
Ramchandra Apte, amaury.forgeotdarc, ezio.melotti, harveyang, mrabarnett |
| Date |
2012年02月22日.02:01:11 |
| SpamBayes Score |
9.597445e-11 |
| Marked as misclassified |
No |
| Message-id |
<1329876072.69.0.349013967937.issue14068@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
As long as you don't mix str and unicode everything works.
With strings:
>>> s = '与清新。阿德莱'
>>> re.split('。', s)
['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1']
>>> s.split('。')
['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1']
With unicode:
>>> u = u'与清新。阿德莱'
>>> re.split(u'。', u)
[u'\u4e0e\u6e05\u65b0', u'\u963f\u5fb7\u83b1']
>>> u.split(u'。')
[u'\u4e0e\u6e05\u65b0', u'\u963f\u5fb7\u83b1']
Mixing str and unicode:
>>> re.split(u'。', s)
['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0\xe3\x80\x82\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1']
>>> re.split('。', u)
[u'\u4e0e\u6e05\u65b0\u3002\u963f\u5fb7\u83b1']
>>>
>>> s.split(u'。')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u.split('。')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
The syntax error is raised for byte literals and can't be backported to 2.7. Raising an error when str and unicode are mixed in re is not backward compatible, and re does work as long as both are ASCII only. I'm therefore closing this as invalid. |
|