I'm reading from a directory a utf-8 text file, then i insert the readed text in a list and I'm obtaining some tuples like this:
l = [('mucho','fácil'),...,('yo','hola')]
When I print it on the console I have the following:
print l
('mucho','f\xc3\xa1cil'),...,('yo','hola')
So I tried the following:
fixing_l = [x.encode('utf-8') for x in l]
When I try to print it I get this exception:
AttributeError: 'tuple' object has no attribute 'encode'
How can I encode and fix the strings and get something like this?:
('mucho','fácil'),...,('yo','hola')
2 Answers 2
I think you mean decode
l = [('mucho','f\xc3\xa1cil'),...,('yo','hola')]
decoded = [[word.decode("utf8") for word in sets] for sets in l]
for words in decoded:
print u" ".join(words)
print 'f\xc3\xa1cil'.decode("utf8")
If you print it you should see the proper string.
Since you intially have a normal byte string you need to decode it which returns a unicode representation of the object ... in the case above u"\xe1" is really just <utf8 bytestring>"\xc3\xa1" which in turn is really all just á
4 Comments
f\xe1cilinstead of fácil. Should I try with other encoding?. I'm in OSX and I look with the terminal the encoding of the file and it says is utf8.f\xe1cil in your txt file? If so, I would guess your file contains texts with more than one encodingu"f\xe1cil" which is simply the unicode representation of the string. unicode is what python uses to represent non ascii characters .... of coarse there are several encodings but that is all just representationIn python3 you can use:
res = [tuple(map(lambda x: x.encode(encoding), tup)) for tup in list_tuples]
Example:
list_tuples = [('mucho','fácil'), ('\u2019', 't')]
res = [tuple(map(lambda x: x.encode('utf-8'), tup)) for tup in list_tuples]
result:
[(b'mucho', b'f\xc3\xa1cil'), (b'\xe2\x80\x99', b't')]
Comments
Explore related questions
See similar questions with these tags.
encode, it's not going to work; what you have in your tuples is<type 'str'>printa container, you're always, inevitably, going to see thereprof the container's items. Therefore, there is no way to make a list of tuples that will display the items'strrather than theirreprupon aprint. You need a customized container class of your own if you require that!reprof a string happens to be pretty much what you desire here; in Python 2.7, not so, when the string includes non-ASCII characters. Which is why you need some custom trick... if any (since__repr__has to return ASCII characters only in Python 2.7).