in linux, I opened terminal and input python2.7 and then input the codes as follows:
>>> s = u'\u0561'
>>> print s
ա
>>> len(s)
1
the length of u'\u0561' is only 1? Why?I learned that every non-alphabet character's length is 2~4 byte in unicode, why does it use only 1 byte? and i test other unicode characters, i found that almost all the unicode character's length is 1, why?
3 Answers 3
The len function doesn't count the number of bytes - it count the number of items in any sequence (in this case, the number of characters in the string).
Comments
the length of u'\u0561' is only 1? Why?
Because ա is one character.
In other words, for the same reason that the len() of ['hi mom this is an incredibly long string'] is 1: because 'hi mom this is an incredibly long string' is one list item.
Comments
It's giving you the length in characters, not bytes.
\u0561
This is one character, so the length is one.
sys.maxuincode == 0xffff):len(u'\U00010000').