Suppose each node of a tree has maximum n
children. I want to serialize and de-serialize for a tree. My current solution is doing pre-order traverse, and using #
to represent None
child.
I'm wondering if there are any better implementation method. My current concern for my code is if the tree has too many None
node (kinds of sparse tree), there will be a lot #
in the serialized string output. But I have not figured out a way to represent serialized tree better, since I think for None
node, we still need some kinds of information to represent there is an empty node there.
I'm looking forward to any smart ideas on existing code and smart ideas to represent serialized tree in a better (more compact and more efficient -- for both serialize and de-serialize).
My print tree method is just for the purpose of verification. deserialize
is correctly re-constructing the tree, so we can skip code review for this part.
number_of_children = 3
index = 0
class Node:
def __init__(self, value):
self.value = value
self.children = [] # a list of Node
def serialize(self, result): # serialzie a tree into a string
result.append(str(self.value))
for child in self.children:
if child:
child.serialize(result)
else:
result.append('#')
def printTree(self): # print tree layer by layer
current_level_buffer=[]
current_level_buffer.append(self)
current_level_count = 1
next_level_count=0
next_level_buffer=[]
print_buffer=[]
next_level_valid = 1
while True:
node = current_level_buffer.pop(0)
if not node:
print_buffer.append('#')
current_level_count -= 1
else:
print_buffer.append(str(node.value))
current_level_count -= 1
for child in node.children:
next_level_buffer.append(child)
next_level_count += 1
if child:
next_level_valid += 1
if current_level_count == 0:
print ' '.join(print_buffer)
current_level_count = next_level_count
current_level_buffer = next_level_buffer
next_level_count = 0
next_level_buffer=[]
print_buffer=[]
if next_level_valid == 0:
return
next_level_valid = 0
def deserialize(source): # deserialize a string into a tree
global index
current_node = None
if source[index] != '#':
current_node = Node(source[index])
for i in range(number_of_children):
index += 1
current_node.children.append(deserialize(source))
return current_node
if __name__ == "__main__":
root = Node(0)
child11 = Node(1)
child12 = Node(2)
root.children.append(child11)
root.children.append(child12)
root.children.append(None)
child21 = Node(3)
child22 = Node(4)
child23 = Node(5)
child11.children.append(child21)
child11.children.append(child22)
child11.children.append(child23)
child24=Node(6)
child25=Node(7)
child26=Node(8)
child12.children.append(child24)
child12.children.append(child25)
child12.children.append(child26)
leafnodes = [child21, child22, child23, child24, child25, child26]
for leaf in leafnodes:
leaf.children.append(None)
leaf.children.append(None)
leaf.children.append(None)
root.printTree()
result = []
root.serialize(result)
print result
print ''.join(result)
new_root = deserialize(''.join(result))
new_root.printTree()
1 Answer 1
Your printTree
function is unnecessarily long.
Also, I really don't like the output of your serialize
. I was not able to parse (as a human) this back into the structure of the tree. For me it would be much better if the structure was apparent from the serialization.
Therefore I rewrote your code in the following way:
class Node:
def __init__(self, value='#', children=None):
self.value = value
self.children = children if children else []
def __nonzero__(self):
return self.value != '#'
def __eq__(self, other):
return self.value == other.value and self.children == other.children
def __str__(self):
buf, out = [self], []
while buf:
# current level
out.append(", ".join(str(node.value) for node in buf))
if any(node for node in buf):
# add children
buf = [subnode if subnode else Node()
for node in buf
for subnode in node.children]
else:
break
return "\n".join(out)
def __repr__(self):
if not self:
return "Node()"
children_repr = ",".join(repr(child) for child in self.children)
return "Node({.value},[{}])".format(self, children_repr)
def serialize(self):
return repr(self).replace("Node", "").replace("()", "#")
@staticmethod
def deserialize(source):
return eval(source.replace("#", "()").replace("(", "Node("))
if __name__ == "__main__":
root = Node(1, [Node(2, [Node(4)]), Node(3, [Node(5), Node(6)])])
print repr(root)
print root
source = root.serialize()
root2 = Node.deserialize(source)
print repr(root2)
print root2
assert root == root2
This class has the nice feature, that eval(repr(root))
is a copy of root
, so repr(root)
is a serialization of root
.
The output of repr(root)
is a bit verbose, though:
Node(1,[Node(2,[Node(4,[])]),Node(3,[Node(5,[]),Node(6,[])])])
Therefore I defined a serialize
function, which just gets rid of all the Node
s. The deserialize function just puts them back in and does an eval
.
Note that I used a different tree here than you. Here would be your structure:
root = Node(0, [Node(1, [Node(3, [Node(), Node(), Node()]),
Node(4, [Node(), Node(), Node()]),
Node(5, [Node(), Node(), Node()])]),
Node(2, [Node(6, [Node(), Node(), Node()]),
Node(7, [Node(), Node(), Node()]),
Node(8, [Node(), Node(), Node()])]),
Node()])
Or, even better to visualize:
root = Node(0, [Node(1, [Node(3, [Node(),
Node(),
Node()]),
Node(4, [Node(),
Node(),
Node()]),
Node(5, [Node(),
Node(),
Node()])]),
Node(2, [Node(6, [Node(),
Node(),
Node()]),
Node(7, [Node(),
Node(),
Node()]),
Node(8, [Node(),
Node(),
Node()])]),
Node()])
Example:
>>> root.serialize()
'(0,[(1,[(3,[#,#,#]),(4,[#,#,#]),(5,[#,#,#])]),(2,[(6,[#,#,#]),(7,[#,#,#]),(8,[#,#,#])]),#])'
This could be cut down even further, but at this point it becomes a balance between the processing time needed to make this string smaller and the gain in memory from that.
With this function it could become:
def serialize2(self):
replacements = [("Node", ""), ("()", "#"),
("(", ""), (")", ""), (",[", "["), (",", "")]
out = repr(self)
for replacement in replacements:
out = out.replace(*replacement)
return out
>>> root.serialize2()
'0[1[3[###]4[###]5[###]]2[6[###]7[###]8[###]]#]'
But this becomes hard to parse back into the tree and would break the setup of using __repr__
and eval
. It would be possible using some fancy regular expressions, though.
-
1\$\begingroup\$ @LinMa Yeah, I was thinking of that. It would be sufficient for my code to have
()
represent a completely empty node. It makes the repr function slightly more complicated, but it might be worth it. \$\endgroup\$Graipher– Graipher2016年11月08日 07:04:29 +00:00Commented Nov 8, 2016 at 7:04 -
1\$\begingroup\$ @LinMa In my opinion repr is a lot more general than a specialized serialize. Which is why the latter is just a special case of the former in my code. \$\endgroup\$Graipher– Graipher2016年11月08日 07:10:00 +00:00Commented Nov 8, 2016 at 7:10
-
1\$\begingroup\$ @LinMa It is for representation (i.e.representing the total structure of the object). Ideally it is represented in such a way that
eval(repr(object))
works, which it is here. \$\endgroup\$Graipher– Graipher2016年11月08日 07:35:35 +00:00Commented Nov 8, 2016 at 7:35 -
1\$\begingroup\$ @LinMa In this version of my code, the empty node is still
Node('#', [])
inrepr
, whileNode()
would be enough. Working on a streamlining of this, where the serialized string is less than half the length of this one. \$\endgroup\$Graipher– Graipher2016年11月08日 07:37:01 +00:00Commented Nov 8, 2016 at 7:37 -
1\$\begingroup\$ @LinMa
str.format
can do a limited number of operations, namely attribute access and indexing:x = [1, 2, 3]; "{0[1]}".format(x); "{[1]}".format(x); "{x[1]}".format(x=x)
all return2
. Similarly my code printsself.value
with"{.value}".format(self)
. \$\endgroup\$Graipher– Graipher2016年11月09日 08:43:42 +00:00Commented Nov 9, 2016 at 8:43
Explore related questions
See similar questions with these tags.