3
\$\begingroup\$

Suppose each node of a tree has maximum n children. I want to serialize and de-serialize for a tree. My current solution is doing pre-order traverse, and using # to represent None child.

I'm wondering if there are any better implementation method. My current concern for my code is if the tree has too many None node (kinds of sparse tree), there will be a lot # in the serialized string output. But I have not figured out a way to represent serialized tree better, since I think for None node, we still need some kinds of information to represent there is an empty node there.

I'm looking forward to any smart ideas on existing code and smart ideas to represent serialized tree in a better (more compact and more efficient -- for both serialize and de-serialize).

My print tree method is just for the purpose of verification. deserialize is correctly re-constructing the tree, so we can skip code review for this part.

number_of_children = 3
index = 0
class Node:
 def __init__(self, value):
 self.value = value
 self.children = [] # a list of Node
 def serialize(self, result): # serialzie a tree into a string
 result.append(str(self.value))
 for child in self.children:
 if child:
 child.serialize(result)
 else:
 result.append('#')
 def printTree(self): # print tree layer by layer
 current_level_buffer=[]
 current_level_buffer.append(self)
 current_level_count = 1
 next_level_count=0
 next_level_buffer=[]
 print_buffer=[]
 next_level_valid = 1
 while True:
 node = current_level_buffer.pop(0)
 if not node:
 print_buffer.append('#')
 current_level_count -= 1
 else:
 print_buffer.append(str(node.value))
 current_level_count -= 1
 for child in node.children:
 next_level_buffer.append(child)
 next_level_count += 1
 if child:
 next_level_valid += 1
 if current_level_count == 0:
 print ' '.join(print_buffer)
 current_level_count = next_level_count
 current_level_buffer = next_level_buffer
 next_level_count = 0
 next_level_buffer=[]
 print_buffer=[]
 if next_level_valid == 0:
 return
 next_level_valid = 0
def deserialize(source): # deserialize a string into a tree
 global index
 current_node = None
 if source[index] != '#':
 current_node = Node(source[index])
 for i in range(number_of_children):
 index += 1
 current_node.children.append(deserialize(source))
 return current_node
if __name__ == "__main__":
 root = Node(0)
 child11 = Node(1)
 child12 = Node(2)
 root.children.append(child11)
 root.children.append(child12)
 root.children.append(None)
 child21 = Node(3)
 child22 = Node(4)
 child23 = Node(5)
 child11.children.append(child21)
 child11.children.append(child22)
 child11.children.append(child23)
 child24=Node(6)
 child25=Node(7)
 child26=Node(8)
 child12.children.append(child24)
 child12.children.append(child25)
 child12.children.append(child26)
 leafnodes = [child21, child22, child23, child24, child25, child26]
 for leaf in leafnodes:
 leaf.children.append(None)
 leaf.children.append(None)
 leaf.children.append(None)
 root.printTree()
 result = []
 root.serialize(result)
 print result
 print ''.join(result)
 new_root = deserialize(''.join(result))
 new_root.printTree()
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Nov 7, 2016 at 8:22
\$\endgroup\$
0

1 Answer 1

5
\$\begingroup\$

Your printTree function is unnecessarily long.

Also, I really don't like the output of your serialize. I was not able to parse (as a human) this back into the structure of the tree. For me it would be much better if the structure was apparent from the serialization.

Therefore I rewrote your code in the following way:

class Node:
 def __init__(self, value='#', children=None):
 self.value = value
 self.children = children if children else []
 def __nonzero__(self):
 return self.value != '#'
 def __eq__(self, other):
 return self.value == other.value and self.children == other.children
 def __str__(self):
 buf, out = [self], []
 while buf:
 # current level
 out.append(", ".join(str(node.value) for node in buf))
 if any(node for node in buf):
 # add children
 buf = [subnode if subnode else Node()
 for node in buf
 for subnode in node.children]
 else:
 break
 return "\n".join(out)
 def __repr__(self):
 if not self:
 return "Node()"
 children_repr = ",".join(repr(child) for child in self.children)
 return "Node({.value},[{}])".format(self, children_repr)
 def serialize(self):
 return repr(self).replace("Node", "").replace("()", "#")
 @staticmethod
 def deserialize(source):
 return eval(source.replace("#", "()").replace("(", "Node("))
if __name__ == "__main__":
 root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
 print repr(root)
 print root
 source = root.serialize()
 root2 = Node.deserialize(source)
 print repr(root2)
 print root2
 assert root == root2

This class has the nice feature, that eval(repr(root)) is a copy of root, so repr(root) is a serialization of root.

The output of repr(root) is a bit verbose, though:

Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])

Therefore I defined a serialize function, which just gets rid of all the Nodes. The deserialize function just puts them back in and does an eval.

Note that I used a different tree here than you. Here would be your structure:

root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
 Node(4, [Node(), Node(), Node()]),
 Node(5, [Node(), Node(), Node()])]),
 Node(2, [Node(6, [Node(), Node(), Node()]),
 Node(7, [Node(), Node(), Node()]),
 Node(8, [Node(), Node(), Node()])]),
 Node()])

Or, even better to visualize:

root = Node(0, [Node(1, [Node(3, [Node(),
 Node(),
 Node()]),
 Node(4, [Node(),
 Node(),
 Node()]),
 Node(5, [Node(),
 Node(),
 Node()])]),
 Node(2, [Node(6, [Node(),
 Node(),
 Node()]),
 Node(7, [Node(),
 Node(),
 Node()]),
 Node(8, [Node(),
 Node(),
 Node()])]),
 Node()])

Example:

>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'

This could be cut down even further, but at this point it becomes a balance between the processing time needed to make this string smaller and the gain in memory from that.

With this function it could become:

def serialize2(self):
 replacements = [("Node", ""), ("()", "#"),
 ("(", ""), (")", ""), (",[", "["), (",", "")]
 out = repr(self)
 for replacement in replacements:
 out = out.replace(*replacement)
 return out
>>> root.serialize2()
'0[1[3[###]4[###]5[###]]2[6[###]7[###]8[###]]#]'

But this becomes hard to parse back into the tree and would break the setup of using __repr__ and eval. It would be possible using some fancy regular expressions, though.

answered Nov 7, 2016 at 13:22
\$\endgroup\$
18
  • 1
    \$\begingroup\$ @LinMa Yeah, I was thinking of that. It would be sufficient for my code to have () represent a completely empty node. It makes the repr function slightly more complicated, but it might be worth it. \$\endgroup\$ Commented Nov 8, 2016 at 7:04
  • 1
    \$\begingroup\$ @LinMa In my opinion repr is a lot more general than a specialized serialize. Which is why the latter is just a special case of the former in my code. \$\endgroup\$ Commented Nov 8, 2016 at 7:10
  • 1
    \$\begingroup\$ @LinMa It is for representation (i.e.representing the total structure of the object). Ideally it is represented in such a way that eval(repr(object)) works, which it is here. \$\endgroup\$ Commented Nov 8, 2016 at 7:35
  • 1
    \$\begingroup\$ @LinMa In this version of my code, the empty node is still Node('#', []) in repr, while Node() would be enough. Working on a streamlining of this, where the serialized string is less than half the length of this one. \$\endgroup\$ Commented Nov 8, 2016 at 7:37
  • 1
    \$\begingroup\$ @LinMa str.format can do a limited number of operations, namely attribute access and indexing: x = [1, 2, 3]; "{0[1]}".format(x); "{[1]}".format(x); "{x[1]}".format(x=x) all return 2. Similarly my code prints self.value with "{.value}".format(self). \$\endgroup\$ Commented Nov 9, 2016 at 8:43

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.