3

I have some Unicode string in a document. All I want is to remove this Unicode code or replace it with some space (" "). Example =""

doc = "Hello my name is Ruth \u2026! I really like swimming and dancing \ud83c"

How do I convert it to the following?

doc = "Hello my name is Ruth! I really like swimming and dancing"

I already tried this: https://stackoverflow.com/a/20078869/5505608, but nothing happens. I'm using Python 3.

asked May 16, 2017 at 20:10
3
  • If the answer you linked didn't work, there's something you're not telling us. Commented May 16, 2017 at 21:32
  • i already tried re.sub(r'[^\x00-\x7F]+',' ', text). the code works, but nothing changed @MarkRansom Commented May 17, 2017 at 5:38
  • That's because strings don't update in-place, they're immutable. You need to take the return value of re.sub and assign it back to text. Commented May 17, 2017 at 14:00

1 Answer 1

9

You can encode to ASCII and ignore errors (i.e. code points that cannot be converted to an ASCII character).

>>> doc = "Hello my name is Ruth \u2026! I really like swimming and dancing \ud83c"
>>> doc.encode('ascii', errors='ignore')
b'Hello my name is Ruth ! I really like swimming and dancing '

If the trailing whitespace bothers you, strip it off. Depending on your use case, you can decode the result again with ASCII. Chaining everything would look like this:

>>> doc.encode('ascii', errors='ignore').strip().decode('ascii')
'Hello my name is Ruth ! I really like swimming and dancing'
answered May 16, 2017 at 20:29
Sign up to request clarification or add additional context in comments.

5 Comments

i've already tried to encode, the code works but still nothing change. thanks for your reply.
my purpose is to clean unicode code from the tweet that i've streamed. I tried the code to my tweet.txt which is contain 10 tweets.
which one? @timgeb
the one in the answer.
the unicode code still appears after using tweet.encode('ascii', errors='ignore')

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.