My function returns a tuple which is then assigned to a variable x and appended to a list.
x = (u'string1', u'string2', u'string3', u'string4')
resultsList.append(x)
The function is called multiple times and final list consists of 20 tuples.
The strings within the tuple are in unicode and I would like to convert them to utf-8.
Some of the strings include also non-ASCII characters like ö, ä, etc.
Is there a way to convert them all in one step?
-
sorry that was just a typo...user2560609– user25606092013年07月08日 12:54:13 +00:00Commented Jul 8, 2013 at 12:54
-
possible duplicate stackoverflow.com/questions/27714750/…hamed– hamed2016年09月21日 10:43:19 +00:00Commented Sep 21, 2016 at 10:43
1 Answer 1
Use a nested list comprehension:
encoded = [[s.encode('utf8') for s in t] for t in resultsList]
This produces a list of lists containing byte strings of UTF-8 encoded data.
If you were to print these lists, you'll see Python represent the contents of the Python byte strings as Python literal strings; with quotes and with any bytes that aro not printable ASCII codepoints represented with escape sequences:
>>> l = ['Kaiserstra\xc3\x9fe']
>>> l
['Kaiserstra\xc3\x9fe']
>>> l[0]
'Kaiserstra\xc3\x9fe'
>>> print l[0]
Kaiserstraße
This is normal as Python presents this data for debugging purposes. The \xc3 and \x9f escape sequences represent the two UTF-8 bytes C39F (hexadecimal) that are used to encode the small ringel-es character.
3 Comments
'\xc3\x9f' is Python's escape format to represent the two hexadecimal bytes C3 and 9F, which is the UTF-8 representation of the ß small ringel-es.