python encoding for turkish characters

Asked 12 years, 7 months ago

Viewed 1k times

I have to read pdf books that are turkish stories. I found a library which is called pyPdf. My test function whichis the below doesn't encode correctly. I think, I need to have turkish codec packet. Am i wrong ? if i am wrong how can I solve this problem orelse how can I find this turkish codec packet?

from StringIO import StringIO
import pyPdf,os
def getPDFContent(path):
 content = ""
 num_pages = 10
 p = file(path, "rb")
 pdf = pyPdf.PdfFileReader(p)
 for i in range(0, num_pages):
 content += pdf.getPage(i).extractText() + "\n"
 content = " ".join(content.replace(u"\xa0", " ").strip().split()) 
 return content
if __name__ == '__main__':
 pdfContent = StringIO(getPDFContent(os.path.abspath("adiaylin-aysekulin.pdf")).encode("utf-8", "ignore"))
 for line in pdfContent:
 print line.strip()
 input("Press Enter to continue...")

Improve this question

edited May 22, 2013 at 19:18

asked May 22, 2013 at 16:22

hinzir's user avatar

hinzir

1789 bronze badges

1

What did you want to say ? Can you explain me ?

hinzir
– hinzir

2013年05月27日 10:27:45 +00:00
Commented May 27, 2013 at 10:27

Add a comment |

1 Answer 1

Sorted by: Reset to default

What kind of error / unexpected output are you getting specifically?

According to the pyPdf homepage, pyPdf is no longer maintained. But there is a fork called PyPDF2 (GitHub) that promises to "handle a wider range of input PDF instances".

Maybe upgrading to PyPDF2 solves your problem, I suggest you try that first.

Improve this answer

answered May 28, 2013 at 12:22

dbader's user avatar

dbader

10.6k2 gold badges24 silver badges17 bronze badges

1 Comment

hinzir

hinzir Over a year ago

I solved the problem in a different way. Before I have converted pdf to text and then i read the the text file.

2013年05月28日T12:33:17.633Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

python encoding for turkish characters

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related