I have a string contains Unicode characters and I want to convert it to UTF-8 in python.
s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
I want convert s to UTF format.
-
4Possible duplicate of How to convert a string to utf-8 in PythonGadaaDhaariGeek– GadaaDhaariGeek2019年07月02日 12:26:18 +00:00Commented Jul 2, 2019 at 12:26
2 Answers 2
Add u as prefix for the string s then encode it in utf-8.
Your code will look like this:
s = u'\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
s_encoded = s.encode('utf-8')
print(s_encoded)
I hope this helps.
answered Jul 2, 2019 at 12:25
GadaaDhaariGeek
1,0401 gold badge15 silver badges33 bronze badges
Sign up to request clarification or add additional context in comments.
1 Comment
lenz
If the OP is using Python 3 (it seems so), then the
u prefix isn't necessary. But the .encode('utf8') is definitely right.Add the below line in the top of your .py file.
# -*- coding: utf-8 -*-
It allows you to encode strings directly in your python script, like this:
# -*- coding: utf-8 -*-
s = '\u0628\u06cc\u0633\u06a9\u0648\u06cc\u062a'
print(s)
Output :
بیسکویت
answered Jul 2, 2019 at 12:19
Usman
2,0292 gold badges18 silver badges30 bronze badges
2 Comments
lenz
The source encoding declaration doesn't really apply here, because the string is entered with ASCII-only characters. It would be different if the string literal was actually composed of Arabic letters (not escape sequences).
Mark Tolonen
A coding line declares the encoding of the source file only. If you have only ASCII characters in the source (as above) it does nothing. In fact, in Python 3, UTF-8 is the default source encoding if undeclared.
lang-py