0

I am trying to decode strings in a list of strings, for example 'caf\\xc3\\xab' what I want if this to be 'café'.

I tried some things but ran into problems.

when i do:

for i in range(len(words):
 words[i] = words[i].decode("utf8")

I still need to convert to byte type but how do I do this,

also when I do it like this I need to remove the double backslashes for this to work

b'caf\\xc3\\xab'.decode("utf8")
asked Mar 26, 2020 at 11:56
3
  • python2's str is bytes, you can just use unicode or ues python3 (in python3 str is unicode) Commented Mar 26, 2020 at 13:33
  • I use python3 but read the strings from a file in that specific format Commented Mar 26, 2020 at 13:56
  • words.decode() is not an in-place operation, you need to capture the return value: word = word.decode("utf8"). (Further note: this will only change the value of the loop variable word, but not the elements in words.) Commented Mar 26, 2020 at 15:32

1 Answer 1

2

Suppose you have string as follow:

bef = 'caf\\xc3\\xab'

To convert to 'café' you can do the following:

aft = bef.encode().decode('unicode-escape').encode('latin1').decode('utf-8')

Then print(aft) should show 'café'

Nikos Hidalgo
3,7669 gold badges27 silver badges41 bronze badges
answered Mar 26, 2020 at 17:04
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.