The following snippet works perfectly in outputting the correct UTF8 character representation:
a = b"Tenemos la Soluci\xc3\xb3n"
a.decode('utf8')
'Tenemos la Solución' # correct output
But in my use-case the actual bytes are stored as a string in Database. In that case how do i retrieve the output with correct UTF8 representation ?
a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
b = bytes(a, 'utf8')
b.decode('utf8')
'Tenemos la SoluciÃ3n' # incorrect output
Please suggest how to resolve this.
1 Answer 1
What you have is mojibake, and it occurs when, for example, UTF-8-encoded text is stored in a database configured for ISO-8859-1 or similar encoding. latin1 is a 1:1 mapping of Unicode code points to equivalent bytes, assuming the Unicode string only contains U+0000 to U+00FF, and can be used to reverse the problem:
>>> a = "Tenemos la Soluci\xc3\xb3n" # retrieved from Database
>>> a.encode('latin1').decode('utf8')
'Tenemos la Solución'