Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Revisions

1 of 3
user avatar
user avatar

python - problems with regular expression and unicode

Hi I have a problem in python. I try to explain my problem with an example.

I have this string:

string = ×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃ' print string×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃ

and i want, for example, replace charachters different from Ñ,Ã,ï with ""

i have tried:

rePat = re.compile('[^ÑÃï]',re.UNICODE) print rePat.sub("",string) �Ñ�����������������������������ï�������������������Ã

I obtained this �. I think that it's happen because this type of characters in python are represented by two position in the vector: for example \xc3\x91 = Ñ. For this, when i make the regolar expression, all the \xc3 are not substitued. How I can do this type of sub?????

Thanks Franco

Franco
lang-py

AltStyle によって変換されたページ (->オリジナル) /