I am writing a program to compress an image using Huffman encoding, and I need to write the data structure and the byte file. After that, I have to read/decode it. I realize that my read and write is very slow.
I think in writetree()
I need to write either width
, height
, compress
bits and the tree
data structure. I am not sure we can combine it or not.
And another part, I think I use too many for
loops so it is very slow in a very long string.
from PIL import Image
import numpy as np
import json
import sys, string
trim = ('0', ('127', '255'))
width = 4
height = 4
longstring = "1100100111001101011110010011010110101111001001101011010111100100110101101011"
def decode (tree, str) :
output = ""
list = []
p = tree
count = 0
for bit in str :
if bit == '0' : p = p[0] # Head up the left branch
else : p = p[1] # or up the right branch
if type(p) == type("") :
output += p # found a character. Add to output
list.append(int(p))
p = tree # and restart for next character
return list
def writetree(tree,height, width,compress):
with open('structure.txt', 'w') as outfile:
json.dump(trim, outfile)
outfile.close()
f = open("info.txt", "w")
f.write(str(height)+"\n")
f.write(str(width)+"\n")
f.write(str(compress)+"\n")
f.close()
def readtree():
with open('structure.txt') as json_file:
data = json.load(json_file)
k = open("info.txt", "r")
heightread = k.readline().strip("\n")
widthread = k.readline().strip("\n")
compressread = k.readline().strip("\n")
json_file.close()
k.close()
return tuple(data), int(heightread), int(widthread), int(compressread)
def writefile():
print("Write file")
with open('file', 'wb') as f:
bit_strings = [longstring[i:i + 8] for i in range(0, len(longstring), 8)]
byte_list = [int(b, 2) for b in bit_strings]
print(byte_list)
realsize = len(bytearray(byte_list))
print('Compress number of bits: ', len(longstring))
writetree(trim,height,width,len(longstring))
f.write(bytearray(byte_list))
f.close()
def readfile():
print("Read file")
byte_list = []
longbin = ""
with open('file', 'rb') as f:
value = f.read(1)
while value != b'':
byte_list.append(ord(value))
value = f.read(1)
print(byte_list)
for a in byte_list[:-1]:
longbin = longbin + '{0:08b}'.format(a)
trim_read, height_read, width_read , compress_read = readtree()
sodu = compress_read%8
'''
because the this string is split 8 bits at the time, and the current compress_read is 76
so the sodu is 4. I have to convert the last byte_list into 4bits not 8 bits
'''
if sodu == 0:
longbin = longbin + '{0:08b}'.format(byte_list[-1])
elif sodu == 1:
longbin = longbin + '{0:01b}'.format(byte_list[-1])
elif sodu == 2:
longbin = longbin + '{0:02b}'.format(byte_list[-1])
elif sodu == 3:
longbin = longbin + '{0:03b}'.format(byte_list[-1])
elif sodu == 4:
longbin = longbin + '{0:04b}'.format(byte_list[-1])
elif sodu == 5:
longbin = longbin + '{0:05b}'.format(byte_list[-1])
elif sodu == 6:
longbin = longbin + '{0:06b}'.format(byte_list[-1])
elif sodu == 7:
longbin = longbin + '{0:07b}'.format(byte_list[-1])
print(longbin)
print("Decode/ show image:")
pixels = decode(trim_read, longbin)
it = iter(pixels)
pixels = list(zip(it,it,it))
#print(pixels)
image_out = Image.new("RGB", (width_read, height_read))
image_out.putdata(pixels)
#image_out.show()
writefile()
readfile()
-
\$\begingroup\$ Hello! I think you have a great first post, could you maybe explain quickly how your algorithm works? I think your question might draw more attention, but anyways I think you've done a good job. \$\endgroup\$IEatBagels– IEatBagels2019年11月23日 00:59:21 +00:00Commented Nov 23, 2019 at 0:59
-
1\$\begingroup\$ I am using the Huffman encoding, It is quite long to explain in here. \$\endgroup\$fastmen111– fastmen1112019年11月23日 05:50:54 +00:00Commented Nov 23, 2019 at 5:50
-
\$\begingroup\$ for now, i need to write file, save the data structure ( trim ) also save the height and width and compress bit of the image. I think I make it loop over and over again so It much slower in longwer string \$\endgroup\$fastmen111– fastmen1112019年11月23日 05:52:53 +00:00Commented Nov 23, 2019 at 5:52
-
\$\begingroup\$ You can see in readfile function, I read the file and and append to the list, and loop it again to convert '{0:08b}'.format(a) \$\endgroup\$fastmen111– fastmen1112019年11月23日 05:55:18 +00:00Commented Nov 23, 2019 at 5:55
-
\$\begingroup\$ Are you still looking for answers to this? :) \$\endgroup\$AMC– AMC2019年12月16日 02:51:38 +00:00Commented Dec 16, 2019 at 2:51
1 Answer 1
Unused
ruff identifies some unused code.
These lines can be deleted:
import numpy as np
import sys, string
count = 0
realsize = len(bytearray(byte_list))
The tree
input to this function is unused:
def writetree(tree,height, width,compress):
It can be removed:
def writetree(height, width,compress):
It must then be removed from calls to writetree
as well.
Comments
Commented-out code should be deleted to remove clutter:
#print(pixels)
#image_out.show()
Naming
The variables named list
and str
are the same name as a Python built-ins.
This can be confusing. To eliminate the confusion, rename the variables
as something like pixels
and string
, respectively. The first clue is that they have special
coloring (syntax highlighting) in the question, as they do when I copy the code
into my editor.
The PEP 8 style guide recommends snake_case for function and variable names.
For example, writetree
would be write_tree
.
Consider more meaningful names for some of the variables, such as p
and sodu
.
Documentation
The PEP 8 style guide recommends
adding docstrings for functions. For example, with the decode
function,
describe the input and return types and what is being decoded.
Layout
The black program can be used to automatically format the code with consistent use of whitespace around operators and space between functions. This will also split the following line:
if bit == '0' : p = p[0] # Head up the left branch
into 2, which is a good practice:
if bit == "0":
p = p[0] # Head up the left branch
DRY
This expression is repeated several times in the writefile
function:
len(longstring)
You can set it to a variable, then len
will only be executed once.
The following:
longbin = longbin + '{0:08b}'.format(a)
can be simplified using the special assignment operator:
longbin += '{0:08b}'.format(a)
It can be further simplified with an f-string:
longbin += f'{a:08b}'
Explore related questions
See similar questions with these tags.