python - problems with regular expression and unicode
Hi I have a problem in python. I try to explain my problem with an example.
I have this string:
>>> string = ×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃ'
>>> print string×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃ
and i want, for example, replace charachters different from Ñ,Ã,ï with ""
i have tried:
>>> rePat = re.compile('[^ÑÃï]',re.UNICODE)
>>> print rePat.sub("",string)
�Ñ�����������������������������ï�������������������Ã
I obtained this �. I think that it's happen because this type of characters in python are represented by two position in the vector: for example \xc3\x91 = Ñ. For this, when i make the regolar expression, all the \xc3 are not substitued. How I can do this type of sub?????
Thanks Franco
Franco
lang-py