Reading the bytes of a PDF

Asked 10 years, 3 months ago

Viewed 13k times

\$\begingroup\$

I'm quite a newbie in Python and I want to speed up this method since it takes very long time especially when the size of the input file in Mbs. Also, I couldn't figure out how to use Cython in the for loop. I'm using this function with other functions to compare files byte by byte. Any recommendations?

# this function returns a file bytes in a list
filename1 = 'doc1.pdf'
def byte_target(filename1):
 f = open(filename1, "rb")
 try:
 b = f.read(1)
 tlist = []
 while True:
 # get file bytes
 t = ' '.join(format(ord(x), 'b') for x in b)
 b = f.read(1)
 if not b:
 break
 #add this byte to the list
 tlist.append(t)
 #print b 
 finally:
 f.close()
 return tlist

edited Jun 4, 2015 at 20:04

200_success's user avatar

200_success

146k22 gold badges190 silver badges479 bronze badges

asked Jun 4, 2015 at 18:34

amsr's user avatar

amsr amsr

331 gold badge1 silver badge3 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

It's not surprising that this is too slow: you're reading data byte-by-byte. For faster performance you would need to read larger buffers at a time.

If you want to compare files by content, use the filecmp package.

There are also some glaring problems with this code. For example, instead of opening a file, doing something in a try block and closing the file handle manually, you should use the recommended with-resources technique:

 with open(filename1, "rb") as f:
 b = f.read(1)
 # ...

Finally, the function name and all variable names are very poor, and don't help the readers understand their purpose and what you're trying to do.

answered Jun 4, 2015 at 18:41

janos's user avatar

janos janos

113k15 gold badges154 silver badges396 bronze badges

\$\endgroup\$

\$\begingroup\$ Thank you so much janos, I've changed the code as per your recommendation and it is a bit faster now :) \$\endgroup\$

amsr
– amsr

2015年06月05日 10:40:21 +00:00
Commented Jun 5, 2015 at 10:40

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

Stack Exchange Network

Reading the bytes of a PDF

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reading the bytes of a PDF

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions