Combinatorial searching of Huffman trees

Question 1

I need to search some data encoded by a Huffman tree, but not by simply walking the tree: I need test combinations of the data for a property and continue searching based on whether the test is positive or not. To this end, I've also returned the intermediate steps of the Huffman algorithm>

Here is the code I use to generate the extended tree:

import heapq
def encode(symbfreq):
 tree = [[wt, [sym, ""]] for sym, wt in symbfreq]
 heapq.heapify(tree)
 while len(tree)>1:
 lo, hi = sorted([heapq.heappop(tree), heapq.heappop(tree)], key=len)
 for pair in lo[1:]:
 pair[1] = '0' + pair[1]
 for pair in hi[1:]:
 pair[1] = '1' + pair[1]
 heapq.heappush(tree, [lo[0] + hi[0]] + lo[1:] + hi[1:])
 return sorted(heapq.heappop(tree)[1:], key=lambda p: (len(p[-1]), p))
def next_power_of_two(n):
 return int(2**( ceil(log(n,2))))
def full_encode(tree):
 huffman_tree = encode(tree)
 complete_tree = huffman_tree
 get_intermediate_node = lambda val, arr : ''.join( [ char for char,binary in itertools.ifilter( lambda node : node[1].startswith(val),arr)] ) 
 for val in range( next_power_of_two( len(huffman_tree) ) ):
 bvalue = bin(val)[2:] 
 node = [ get_intermediate_node( bvalue , huffman_tree) , bvalue ] 
 if node not in complete_tree:
 complete_tree.append( node)
 complete_tree =[y for y in complete_tree if y[0]!='']
 complete_tree = sorted( complete_tree , key=lambda p: (len(p[-1]), p) )
 return complete_tree

So for example this input:

tree = [('0',0.25),('0',0.25),('0',0.25),('0',0.125),('1',0.125)]

produces this output:

tree = [['00', '0'], ['0', '00'], ['0', '01'], ['0', '10'], ['0', '110'], ['1', '111']]

Once I've done that, I need to search the tree. Since I've got access to all the intermediate stages, I start by checking whether the data in the first leaf contains a '1' (this leaf is encoded by '0'): if this is true then I check the next leaf which has a '0' in the second position of it's code. If false, I check the leaf whose begins with '10'. I keep on doing this until I've found the leaf (and the encoding) which has only a '1' in the data. The code is below:

#searching an extended huffman list 0=>left branch 1=>right branch
def search_huff_list(complete_tree, max_depth):
 defective = 0
 loops = 0
 stage = 0
 code = ['0']
 while defective == 0:
 loops += 1
 current = complete_tree[stage]
 print(stage)
 if current[0] == '1':
 defective = complete_tree[stage]
 return defective,loops
 if len(current[1]) == max_depth:
 if current[0]=='1':
 defective = complete_tree[stage]
 return defective, loops
 else: 
 defective = complete_tree[stage+1]
 return defective, loops
 if not '1' in current[0]:
 code[-1] = '1' 
 code.append('0')
 partial_code = ''.join(code)
 print(partial_code)
 stage = complete_tree.index(next(x for x in complete_tree if x[1].startswith(partial_code) ) )
 else:
 code.append('0')
 partial_code = ''.join(code)
 stage = complete_tree.index(next(x for x in complete_tree if x[1].startswith(partial_code) ) )
 return 0

For the input above, this algorithm finds the '1' (it's easier to debug, if the labels are 'a','b','c','d','e'). I'm building up a partial code and searching the original tree (the extended list) for the first code that begins with that series of bits.

The main problem I have is that this is that this algorithm is complicated enough already: yet I've got to get it to find a 1 in a random tree next. There's already enough corner-case if-else catching going on (for example if I get right down to the end, and the test on the left branch comes back negative, then I know the '1' is in the right branch and I don't need to do another test).

What could I do to make the code more readable/easier to debug? I guess what I'm actually doing isn't that efficient, but I'm struggling to think of another way to do it.

Question 2

I don't understand what you are trying to achieve here. Can you explain the problem you are trying to solve?

Question 3

@GarethRees Hi Gareth, I've modified the data structure so that the groups are encoded in the Huffman tree. The new code can be found here: stackoverflow.com/questions/19817863/sorting-huffman-leaves

Question 4

@GarethRees I'm trying to solve a variant of the 12-coin problem, but where you believe some coins are more likely to be counterfeit than others - I'm encoding that info in a Huffman tree, and then looking for a sorting/searching algorithm to find the counterfeit. en.wikipedia.org/wiki/Counterfeit_coin_problem

Question 5

I've change my data structure, so that the Huffman encoding procedure defines the combinations I need to test:

class Node(object):
 left = None
 right = None
 weight = None
 data = None
 code = ''
 length = len(code)
 def __init__(self, d, w, c):
 self.data = d
 self.weight = w
 self.code = c
 def set_children(self, ln, rn):
 self.left = ln
 self.right = rn
 def __repr__(self):
 return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
 def __cmp__(self, a):
 return cmp(self.code, a.code)
 def __getitem__(self):
 return self.code
def encode(symbfreq):
 tree = [Node(sym,wt,'') for sym, wt in symbfreq]
 heapify(tree)
 while len(tree)>1:
 lo, hi = sorted([heappop(tree), heappop(tree)])
 lo.code = '0'+lo.code
 hi.code = '1'+hi.code
 n = Node(lo.data+hi.data,lo.weight+hi.weight,lo.code+hi.code)
 n.set_children(lo, hi)
 heappush(tree, n)
 return tree[0]

Then I search the groups using a function like this:

def search(tree):
current = tree.right
loops = 0
#defective = 0
while current is not None:
 #print(current)
 loops+=1
 previous = current
 if current.data == 1:
 current = current.left
 else:
 current = current.right
return loops,previous

There are still some changes I need to make: the data field needs to be a set (not the sum that's there currently), and I need to change the encoding strategy so that the left branch of each node always has a '0' appended to its code regardless of permutations of the labels of the input distribution.

Tom Kealy Tom Kealy 5052 gold badges5 silver badges14 bronze badges · Answer 1 · 2013-11-07 10:15:37Z

I've change my data structure, so that the Huffman encoding procedure defines the combinations I need to test:

class Node(object):
 left = None
 right = None
 weight = None
 data = None
 code = ''
 length = len(code)
 def __init__(self, d, w, c):
 self.data = d
 self.weight = w
 self.code = c
 def set_children(self, ln, rn):
 self.left = ln
 self.right = rn
 def __repr__(self):
 return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
 def __cmp__(self, a):
 return cmp(self.code, a.code)
 def __getitem__(self):
 return self.code
def encode(symbfreq):
 tree = [Node(sym,wt,'') for sym, wt in symbfreq]
 heapify(tree)
 while len(tree)>1:
 lo, hi = sorted([heappop(tree), heappop(tree)])
 lo.code = '0'+lo.code
 hi.code = '1'+hi.code
 n = Node(lo.data+hi.data,lo.weight+hi.weight,lo.code+hi.code)
 n.set_children(lo, hi)
 heappush(tree, n)
 return tree[0]

Then I search the groups using a function like this:

def search(tree):
current = tree.right
loops = 0
#defective = 0
while current is not None:
 #print(current)
 loops+=1
 previous = current
 if current.data == 1:
 current = current.left
 else:
 current = current.right
return loops,previous

There are still some changes I need to make: the data field needs to be a set (not the sum that's there currently), and I need to change the encoding strategy so that the left branch of each node always has a '0' appended to its code regardless of permutations of the labels of the input distribution.

Stack Exchange Network

Combinatorial searching of Huffman trees

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Combinatorial searching of Huffman trees

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions