0
import re
##EDIT didn't mean to copy filename = "rr.txt" ## opens file unicode file type
buffer = open('r.txt','r').read()
quotes = re.findall(ur'"[^"^\u201c]*["\u201d].*', buffer)
for quote in quotes:
 print ''
 print quote
## prints quotes found
## Problem is that the print output has rectangular blocks between each Character 

Why?

How do you return output without the rectangular blocks messing everything up?

asked May 11, 2012 at 15:01
4
  • The file I used was a basic save, unicode text file, text copied from a PDF Commented May 11, 2012 at 15:40
  • parisis.files.wordpress.com/2011/01/noam-chomsky.pdf Commented May 11, 2012 at 15:41
  • How do you know the text file is Unicode? What OS are you running Acrobat in? In Windows it saves as a code page where the quotes are 0x93 and 0x94. Commented May 11, 2012 at 16:03
  • When I save the text file it gives options for encoding. They are: ANSI, unicode, unicode big endian and UFT-8. I used unicode... I'm running windows Commented May 11, 2012 at 16:14

2 Answers 2

4

You're opening it incorrectly. And "Unicode" in Windows is actually UTF-16LE.

buffer = codecs.open('r.txt', 'r', encoding='utf-16le').read()
answered May 11, 2012 at 15:10
Sign up to request clarification or add additional context in comments.

3 Comments

I wonder how that regular expression was finding anything if the read was messed up?
@Mark: An interesting question. I suspect that my answer isn't completely correct, but it is about 90% of the way there (e.g. the file is in the system encoding instead of UTF-16LE).
Thanks for the help, the Re, does need some work that i can do now. cheers
2

This isn't related to Python. Your console window renders the output of Python and this breaks.

Use a font in your console window that supports the necessary Unicode characters.

answered May 11, 2012 at 15:04

1 Comment

The above isn't really helpful and it seems to me the problem came from using python and seems to have been fixed using python.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.