1
\$\begingroup\$

I need to search some data encoded by a Huffman tree, but not by simply walking the tree: I need test combinations of the data for a property and continue searching based on whether the test is positive or not. To this end, I've also returned the intermediate steps of the Huffman algorithm>

Here is the code I use to generate the extended tree:

import heapq
def encode(symbfreq):
 tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
 heapq.heapify(tree)
 while len(tree)>1:
 lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
 for pair in lo[1:]:
 pair[1] = '0' + pair[1]
 for pair in hi[1:]:
 pair[1] = '1' + pair[1]
 heapq.heappush(tree, [lo[0] + hi[0]] + lo[1:] + hi[1:])
 return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))
def next_power_of_two(n):
 return int(2**( ceil(log(n,2))))
def full_encode(tree):
 huffman_tree = encode(tree)
 complete_tree = huffman_tree
 get_intermediate_node = lambda val, arr : ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith(val),arr)] ) 
 for val in range( next_power_of_two( len(huffman_tree) ) ):
 bvalue = bin(val)[2:] 
 node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ] 
 if node not in complete_tree:
 complete_tree.append( node)
 complete_tree =[y for y in complete_tree if y[0]!='']
 complete_tree = sorted( complete_tree , key=lambda p: (len(p[-1]), p) )
 return complete_tree

So for example this input:

tree = [('0',0.25),('0',0.25),('0',0.25),('0',0.125),('1',0.125)]

produces this output:

tree = [['00', '0'], ['0', '00'], ['0', '01'], ['0', '10'], ['0', '110'], ['1', '111']]

Once I've done that, I need to search the tree. Since I've got access to all the intermediate stages, I start by checking whether the data in the first leaf contains a '1' (this leaf is encoded by '0'): if this is true then I check the next leaf which has a '0' in the second position of it's code. If false, I check the leaf whose begins with '10'. I keep on doing this until I've found the leaf (and the encoding) which has only a '1' in the data. The code is below:

#searching an extended huffman list 0=>left branch 1=>right branch
def search_huff_list(complete_tree, max_depth):
 defective = 0
 loops = 0
 stage = 0
 code = ['0']
 while defective == 0:
 loops += 1
 current = complete_tree[stage]
 print(stage)
 if current[0] == '1':
 defective = complete_tree[stage]
 return defective,loops
 if len(current[1]) == max_depth:
 if current[0]=='1':
 defective = complete_tree[stage]
 return defective, loops
 else: 
 defective = complete_tree[stage+1]
 return defective, loops
 if not '1' in current[0]:
 code[-1] = '1' 
 code.append('0')
 partial_code = ''.join(code)
 print(partial_code)
 stage = complete_tree.index(next(x for x in complete_tree if x[1].startswith(partial_code) ) )
 else:
 code.append('0')
 partial_code = ''.join(code)
 stage = complete_tree.index(next(x for x in complete_tree if x[1].startswith(partial_code) ) )
 return 0

For the input above, this algorithm finds the '1' (it's easier to debug, if the labels are 'a','b','c','d','e'). I'm building up a partial code and searching the original tree (the extended list) for the first code that begins with that series of bits.

The main problem I have is that this is that this algorithm is complicated enough already: yet I've got to get it to find a 1 in a random tree next. There's already enough corner-case if-else catching going on (for example if I get right down to the end, and the test on the left branch comes back negative, then I know the '1' is in the right branch and I don't need to do another test).

What could I do to make the code more readable/easier to debug? I guess what I'm actually doing isn't that efficient, but I'm struggling to think of another way to do it.

asked Nov 4, 2013 at 18:44
\$\endgroup\$
3
  • \$\begingroup\$ I don't understand what you are trying to achieve here. Can you explain the problem you are trying to solve? \$\endgroup\$ Commented Nov 6, 2013 at 11:31
  • \$\begingroup\$ @GarethRees Hi Gareth, I've modified the data structure so that the groups are encoded in the Huffman tree. The new code can be found here: stackoverflow.com/questions/19817863/sorting-huffman-leaves \$\endgroup\$ Commented Nov 6, 2013 at 21:53
  • \$\begingroup\$ @GarethRees I'm trying to solve a variant of the 12-coin problem, but where you believe some coins are more likely to be counterfeit than others - I'm encoding that info in a Huffman tree, and then looking for a sorting/searching algorithm to find the counterfeit. en.wikipedia.org/wiki/Counterfeit_coin_problem \$\endgroup\$ Commented Nov 6, 2013 at 21:55

1 Answer 1

1
\$\begingroup\$

I've change my data structure, so that the Huffman encoding procedure defines the combinations I need to test:

class Node(object):
 left = None
 right = None
 weight = None
 data = None
 code = ''
 length = len(code)
 def __init__(self, d, w, c):
 self.data = d
 self.weight = w
 self.code = c
 def set_children(self, ln, rn):
 self.left = ln
 self.right = rn
 def __repr__(self):
 return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
 def __cmp__(self, a):
 return cmp(self.code, a.code)
 def __getitem__(self):
 return self.code
def encode(symbfreq):
 tree = [Node(sym,wt,'') for sym, wt in symbfreq]
 heapify(tree)
 while len(tree)>1:
 lo, hi = sorted([heappop(tree), heappop(tree)])
 lo.code = '0'+lo.code
 hi.code = '1'+hi.code
 n = Node(lo.data+hi.data,lo.weight+hi.weight,lo.code+hi.code)
 n.set_children(lo, hi)
 heappush(tree, n)
 return tree[0]

Then I search the groups using a function like this:

def search(tree):
current = tree.right
loops = 0
#defective = 0
while current is not None:
 #print(current)
 loops+=1
 previous = current
 if current.data == 1:
 current = current.left
 else:
 current = current.right
return loops,previous

There are still some changes I need to make: the data field needs to be a set (not the sum that's there currently), and I need to change the encoding strategy so that the left branch of each node always has a '0' appended to its code regardless of permutations of the labels of the input distribution.

answered Nov 7, 2013 at 10:15
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.