How to convert a unicode list of tuples into utf-8 with python

Question 1

My function returns a tuple which is then assigned to a variable x and appended to a list.

x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)

The function is called multiple times and final list consists of 20 tuples.

The strings within the tuple are in unicode and I would like to convert them to utf-8.

Some of the strings include also non-ASCII characters like ö, ä, etc.

Is there a way to convert them all in one step?

Question 2

sorry that was just a typo...

Question 3

possible duplicate stackoverflow.com/questions/27714750/…

Question 4

Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.

Question 5

Thank you very much, but what about the non-ASCII characters Kaiserstraße becomes Kaiserstra\xc3\x9fe

Question 6

@user2560609: This works on unicode values, any such value, including non-ASCII codepoints. Your output is a Python string literal representation of a UTF8 encoded bytestring.

Question 7

@user2560609: In other words: it works. '\xc3\x9f' is Python's escape format to represent the two hexadecimal bytes C3 and 9F, which is the UTF-8 representation of the ß small ringel-es.

Martijn Pieters 1.1m326 gold badges4.2k silver badges3.5k bronze badges · Accepted Answer · 2013-07-08 12:50:14Z

Use a nested list comprehension:

encoded = [[s.encode('utf8') for s in t] for t in resultsList]

This produces a list of lists containing byte strings of UTF-8 encoded data.

If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:

>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße

This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.

Thank you very much, but what about the non-ASCII characters Kaiserstraße becomes Kaiserstra\xc3\x9fe
@user2560609: This works on unicode values, any such value, including non-ASCII codepoints. Your output is a Python string literal representation of a UTF8 encoded bytestring.
@user2560609: In other words: it works. '\xc3\x9f' is Python's escape format to represent the two hexadecimal bytes C3 and 9F, which is the UTF-8 representation of the ß small ringel-es.

CollectivesTM on Stack Overflow

How to convert a unicode list of tuples into utf-8 with python

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related