0

in linux, I opened terminal and input python2.7 and then input the codes as follows:

>>> s = u'\u0561'
>>> print s
ա
>>> len(s)
1

the length of u'\u0561' is only 1? Why?I learned that every non-alphabet character's length is 2~4 byte in unicode, why does it use only 1 byte? and i test other unicode characters, i found that almost all the unicode character's length is 1, why?

Roman Bodnarchuk
29.8k12 gold badges62 silver badges76 bronze badges
asked Nov 26, 2011 at 7:59
1
  • 1
    Just to confuse you, try this on a narrow build (i.e. sys.maxuincode == 0xffff): len(u'\U00010000'). Commented Nov 26, 2011 at 8:35

3 Answers 3

7

The len function doesn't count the number of bytes - it count the number of items in any sequence (in this case, the number of characters in the string).

answered Nov 26, 2011 at 8:01
Sign up to request clarification or add additional context in comments.

Comments

1

the length of u'\u0561' is only 1? Why?

Because ա is one character.

In other words, for the same reason that the len() of ['hi mom this is an incredibly long string'] is 1: because 'hi mom this is an incredibly long string' is one list item.

answered Nov 26, 2011 at 8:15

Comments

0

It's giving you the length in characters, not bytes.

\u0561

This is one character, so the length is one.

answered Nov 26, 2011 at 8:01

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.