i want to concatenate some strings(Persian strings) in python:
for t in lstres:
with conn:
c = conn.cursor()
q="SELECT fa FROM words WHERE en ='"+t+"'"
c.execute(q)
lst=c.fetchall()
if lst:
W.append(lst)
else:
W.append(t)
cnum=1
for can in W:
cnum=cnum*len(W)
candida=Set()
for ii in range(1,min(20,cnum)):
candid=""
for w in W:
candid+=str(" "+random.choice (w)[0]).encode('utf-8')
candida.add(candid)
but it says :
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 1: ordinal not in range(128)
what is the problem ?
3 Answers 3
Somewhere along the line Python is trying to do an implicit type conversion from a unicode string to an ASCII encoded string. Where this is happening is difficult to tell from what you've posted, but it's better to just make sure that you always use unicode anyway. To do this you need to add a u in front of all your strings like so: u"A unicode string" and always use unicode() instead of str().
Unicode is often overlooked by English language programmers and tutorials because in English you can get away with just using ASCII encoded characters. Unfortunately the rest of the world suffers for this because most languages use characters not supported by ASCII. It might be useful to look over the Python Unicode HOWTO to get some guidance on good programming practice in Unicode.
I also found this article very useful.
Comments
The problem is here:
for ii in range(1,min(20,cnum)):
candid=""
for w in W:
candid+=str(" "+random.choice (w)[0]).encode('utf-8')
candida.add(candid)
It should be
for ii in range(1,min(20,cnum)):
candid=u""
for w in W:
candid+=str(u" "+random.choice (w)[0]).encode('utf-8')
candida.add(candid)
but it's not idiomatic python
you should do
for ii in range(1,min(20,cnum)):
candida.add(u" ".join(random.choice (w)[0] for w in W))
moreover there is a potentiel sql injection in your script
q="SELECT fa FROM words WHERE en ='"+t+"'"
c.execute(q)
you should do
q="SELECT fa FROM words WHERE en =?"
c.execute(q, (t,))
(t,) is a tuple with only one element
Comments
You need to declare your strings as Unicode :
u'Your string here ×ばつ...'
Wwould be interesting.