2

I read a file with the code below and then I want to find words in the file using re library. The file contains Turkish characters. So I decode file using utf-8. re library doesn't know Turkish character. Below code isn't working.

 text= unicodedata.normalize("NFKD",codecs.open(os.path.abspath("texts/kopru1.txt"),"rb").read().decode("utf-8"))
 text=text.replace("\r\n"," ").lower()
 aa= re.findall(ur"[a-zçşıöü]+", text,re.UNICODE) 

Although "ayşe" is a word, this word seems as of "ays" and "e".

Junuxx
14.3k5 gold badges43 silver badges74 bronze badges
asked Jun 11, 2013 at 16:55
3
  • 1
    Could you give some example data and tell us what you want to do? Commented Jun 11, 2013 at 17:00
  • 1
    example string is "ayşe kulin köprü". I want to find words in this string. Commented Jun 11, 2013 at 17:03
  • If you want to split by word why not use text.split(" ")? Commented Jun 11, 2013 at 17:05

1 Answer 1

5

Use the escape sequence \w which means "a letter of any kind." Just getting an example sentence from wikipedia:

>>> text = u'Türkî-i çin (güzel güneş) terkiplerinde de gördüğümüz'
>>> re.findall(r'\w+', text, re.UNICODE)
['Türkî', 'i', 'çin', 'güzel', 'güneş', 'terkiplerinde', 'de', 'gördüğümüz']
answered Jun 11, 2013 at 17:03
Sign up to request clarification or add additional context in comments.

7 Comments

I had done before. And again I did. But the code isn't still working.
@hinzir what does your text variable look like before you try to match on it?
@hinzir Oh, right. I've updated my reply with additional hints. See if it helps.
@kgr you are wrong about replace method. docs.python.org/2/library/string.html#string.replace "Return a copy of string s with all occurrences of substring old replaced by new. "
@hinzir That's really, really weird. When I tested yesterday I could swear it wouldn't return a new string, but now that I try it does.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.