Return to Answer

Post Timeline

added 2 characters in body

Source Link

edited Nov 10, 2020 at 4:08

anthony sottile

edited Nov 10, 2020 at 4:08

anthony sottile

72.1k
20
199
248

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

Code for those that want to copy/paste into a program instead of command line:

source_text = "Al BaÅ£á ̧©ah"
print("source_text=", source_text)
encoded_source_text = source_text.encode('latin1')
decoded_text = encoded_source_text.decode("utf"'UTF-8')
print("decoded_text=", decoded_text)

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

Code for those that want to copy/paste into a program instead of command line:

source_text = "Al BaÅ£á ̧©ah"
print("source_text=", source_text)
encoded_source_text = source_text.encode('latin1')
decoded_text = encoded_source_text.decode("utf")
print("decoded_text=", decoded_text)

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

Code for those that want to copy/paste into a program instead of command line:

source_text = "Al BaÅ£á ̧©ah"
print("source_text=", source_text)
encoded_source_text = source_text.encode('latin1')
decoded_text = encoded_source_text.decode('UTF-8')
print("decoded_text=", decoded_text)

added source code instead of command line format

Source Link

edited Nov 10, 2020 at 4:04

NealWalters

edited Nov 10, 2020 at 4:04

NealWalters

18.5k
48
165
285

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

Code for those that want to copy/paste into a program instead of command line:

source_text = "Al BaÅ£á ̧©ah"
print("source_text=", source_text)
encoded_source_text = source_text.encode('latin1')
decoded_text = encoded_source_text.decode("utf")
print("decoded_text=", decoded_text)

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

Code for those that want to copy/paste into a program instead of command line:

source_text = "Al BaÅ£á ̧©ah"
print("source_text=", source_text)
encoded_source_text = source_text.encode('latin1')
decoded_text = encoded_source_text.decode("utf")
print("decoded_text=", decoded_text)

Source Link

answered Nov 10, 2020 at 2:22

anthony sottile

answered Nov 10, 2020 at 2:22

anthony sottile

72.1k
20
199
248

looks like a classic case of mojibake -- in this case it's interpreted using latin1 when it should be UTF-8:

>>> "Al BaÅ£á ̧©ah".encode('latin1')
b'Al Ba\xc5\xa3\xe1\xb8\xa9ah'
>>> "Al BaÅ£á ̧©ah".encode('latin1').decode('UTF-8')
'Al Baţḩah'

lang-py

CollectivesTM on Stack Overflow