I'm quite a newbie in Python and I want to speed up this method since it takes very long time especially when the size of the input file in Mbs. Also, I couldn't figure out how to use Cython in the for loop. I'm using this function with other functions to compare files byte by byte. Any recommendations?
# this function returns a file bytes in a list
filename1 = 'doc1.pdf'
def byte_target(filename1):
f = open(filename1, "rb")
try:
b = f.read(1)
tlist = []
while True:
# get file bytes
t = ' '.join(format(ord(x), 'b') for x in b)
b = f.read(1)
if not b:
break
#add this byte to the list
tlist.append(t)
#print b
finally:
f.close()
return tlist
1 Answer 1
It's not surprising that this is too slow: you're reading data byte-by-byte. For faster performance you would need to read larger buffers at a time.
If you want to compare files by content, use the filecmp
package.
There are also some glaring problems with this code.
For example, instead of opening a file, doing something in a try
block and closing the file handle manually, you should use the recommended with-resources technique:
with open(filename1, "rb") as f:
b = f.read(1)
# ...
Finally, the function name and all variable names are very poor, and don't help the readers understand their purpose and what you're trying to do.
-
\$\begingroup\$ Thank you so much janos, I've changed the code as per your recommendation and it is a bit faster now :) \$\endgroup\$amsr– amsr2015年06月05日 10:40:21 +00:00Commented Jun 5, 2015 at 10:40
Explore related questions
See similar questions with these tags.