3
\$\begingroup\$

Previously I asked a question on how to group dictionaries in a list in a hierarchical structure.

Initially I wanted to group a list of dictionaries that looks like the following, using multiple keys:

[{'dept':1, 'age':10, 'name':'Sam'},
{'dept':1, 'age':12, 'name':'John'},
.
.
.
{'dept':2,'age':20, 'name':'Mary'},
{'dept':2,'age':11, 'name':'Mark'},
{'dept':2,'age':11, 'name':'Tom'}]

And the output would be:

[
 {
 2: {
 20: [
 {
 'age': 20,
 'dept': 2,
 'name': 'Mary'
 }
 ]
 },
 {
 11: [
 {
 'age': 11,
 'dept': 2,
 'name': 'Mark'
 },
 {
 'age': 11,
 'dept': 2,
 'name': 'Tom'
 }
 ]
 }
 },
 {
 1: {
 10: [
 {
 'age': 10,
 'dept': 1,
 'name': 'Sam'
 }
 ]
 },
 {
 12: [
 {
 'age': 12,
 'dept': 1,
 'name': 'John'
 }
 ]
 }
 }
]

Using this code:

import itertools, operator
l = [{'dept':1, 'age':10, 'name':'Sam'},
 {'dept':1, 'age':12, 'name':'John'},
 {'dept':2,'age':20, 'name':'Mary'},
 {'dept':2,'age':11, 'name':'Mark'},
 {'dept':2,'age':11, 'name':'Tom'}]
groups = ['dept', 'age', 'name'] 
groups.reverse()
def hierachical_data(data, groups):
 g = groups[-1]
 g_list = []
 for key, items in itertools.groupby(data, operator.itemgetter(g)):
 g_list.append({key:list(items)})
 groups = groups[0:-1]
 if(len(groups) != 0):
 for e in g_list:
 for k,v in e.items():
 e[k] = hierachical_data(v, groups)
 return g_list
print hierachical_data(l, groups)

This method works fine, but now I need to optimize. The dictionaries has a big memory overhead and this grouping gets pretty slow when we are dealing with huge number of records.

I was wondering if there is any algorithm I could use to reduce the time needed to do the grouping.

P.S: I wouldn't mind changing the data-structure as long as it gives me the same hierarchical format, or any better suggestion of course.

Thanks.

asked Oct 16, 2012 at 15:21
\$\endgroup\$
16
  • 1
    \$\begingroup\$ Welcome to Code Review. If you read the FAQ, you will find that you need to post your code into your question or we'll have to close it. Please edit your post and paste in your code so that we can review it. For a really great question (worthy of upvoting) please include everything needed (including data) so that others can execute your code easily. \$\endgroup\$ Commented Oct 16, 2012 at 20:36
  • 1
    \$\begingroup\$ @Leonid Actually that's currently the solution I am using. To make it clear these records come from a database and then they are serialized into a dictionary. I use grouping and aggregation functions on the database side to reduce the number of records returned, but I was wondering if I can make my python code (the part that does the rest of the processing) even better. \$\endgroup\$ Commented Oct 17, 2012 at 2:28
  • 1
    \$\begingroup\$ It would be useful to have an actual sample of large data so we could run that. Otherwise, we are optimizing in the dark. \$\endgroup\$ Commented Oct 17, 2012 at 12:14
  • 1
    \$\begingroup\$ it seams weird that your output data structure is a list of dictionaries with only one key in each. IMHO you need a list only at the last nesting level (where you store you originals record). \$\endgroup\$ Commented Oct 17, 2012 at 12:27
  • 1
    \$\begingroup\$ The perhaps you should use the power of the database partly - select dept, age, name from something order by dept, age; This is O(n), but if this is too slow, then yes - hack away. \$\endgroup\$ Commented Oct 17, 2012 at 21:37

1 Answer 1

2
\$\begingroup\$

Try doing a recursive sequence that iterates through each group, by passing a group index to each call, and find and build the tree as it receives inputs:

def generate_hierarchical_hash(entries, groups):
 """Returns hierarchical hash of entries, grouped at each level by the groups"""
 hierarchy = {}
 for to_insert in entries:
 insert_into_hierarchy(to_insert, hierarchy, groups, 0)
 return hierarchy
def insert_into_hierarchy(to_insert, hierarchy, groupings, current_grouping_level):
 if current_grouping_level == len(groupings):
 # if at final level, do not do anything - protection case
 pass
 else:
 grouping_at_level = groupings[current_grouping_level]
 grouping_key = to_insert[grouping_at_level]
 is_final_group = current_grouping_level == len(groupings) - 1
 if is_final_group:
 # if at last grouping, nodes will contain lists, so create or retrieve list
 # and append value to it
 if grouping_key in hierarchy:
 list_for_group = hierarchy[grouping_key]
 else:
 # create new list and insert it into hierarchy
 list_for_group = list()
 hierarchy[grouping_key] = list_for_group
 # insert to insert into end of list
 list_for_group.append(to_insert)
 else:
 # otherwise, find or create hash for this grouping,
 if grouping_key in hierarchy:
 hierarchy_for_group = hierarchy[grouping_key]
 else:
 # create hash and insert into parent hash
 hierarchy_for_group = {}
 hierarchy[grouping_key] = hierarchy_for_group
 # recursive call, go down a level and group
 insert_into_hierarchy(to_insert, hierarchy_for_group, groupings, current_grouping_level + 1)

This could be called like:

input =[{'dept':1, 'age':10, 'name':'Sam'},
 {'dept':1, 'age':12, 'name':'John'},
 {'dept':2,'age':20, 'name':'Mary'},
 {'dept':2,'age':11, 'name':'Mark'},
 {'dept':2,'age':11, 'name':'Tom'}]
groups = ['dept', 'age', 'name']
hierarchical_hash = generate_hierarchical_hash(input, groups)
answered Oct 20, 2012 at 14:22
\$\endgroup\$
3
  • \$\begingroup\$ Thanks a lot for your answer. Please allow me sometime to test it before I mark the question as answered. \$\endgroup\$ Commented Oct 20, 2012 at 18:08
  • \$\begingroup\$ Any luck with this? \$\endgroup\$ Commented Nov 15, 2012 at 15:44
  • \$\begingroup\$ I am sorry I haven't checked it yet things have been crazy at my end. \$\endgroup\$ Commented Nov 18, 2012 at 13:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.