Turning results from SQL query into compact Python form

Question 1

I have a database schema in Postgres that looks like this (in pseudo code):

users (table):
 pk (field, unique)
 name (field)
permissions (table):
 pk (field, unique)
 permission (field, unique)
addresses (table):
 pk (field, unique)
 address (field, unique)
association1 (table):
 user_pk (field, foreign_key)
 permission_pk (field, foreign_key)
association2 (table):
 user_pk (field, foreign_key)
 address_pk (field, foreign_key)

Hopefully this makes intuitive sense. It's a users table that has a many-to-many relationship with a permissions table as well as a many-to-many relationship with an addresses table.

In Python, when I perform the correct SQLAlchemy query incantations, I get back results that look something like this (after converting them to a list of dictionaries in Python):

results = [
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
 {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]

So in this contrived example, Joe is both a user and and an admin. John is only a user. Both Joe's home and work addresses exist in the database. Only John's home address exists.

So the question is, does anybody know the best way to go from these SQL query 'results' to the more compact 'desired_results' below?

desired_results = [
 {
 'pk': 1,
 'name': 'Joe',
 'permissions': ['user', 'admin'],
 'addresses': ['home', 'work']
 },
 {
 'pk': 2,
 'name': 'John',
 'permissions': ['user'],
 'addresses': ['home']
 },
]

Additional information required: Small list of dictionaries describing the 'labels' I would like to use in the desired_results for each of the fields that have many-to-many relationships.

relationships = [
 {'label': 'permissions', 'back_populates': 'permission'},
 {'label': 'addresses', 'back_populates': 'address'},
]

Final consideration, I've put together a concrete example for the purposes of this question, but in general I'm trying to solve the problem of querying SQL databases in general, assuming an arbitrary amount of relationships. SQLAlchemy ORM solves this problem well, but I'm limited to using SQLAlchemy Core; so am trying to build my own solution.

Update

Here's an answer, but I'm not sure it's the best / most efficient solution. Can anyone come up with something better?

# step 1: generate set of keys that will be replaced by new keys in desired_result
back_populates = set(rel['back_populates'] for rel in relationships)
# step 2: delete from results keys generated in step 1
intermediate_results = [
 {k: v for k, v in res.items() if k not in back_populates}
 for res in results]
# step 3: eliminate duplicates
intermediate_results = [
 dict(t)
 for t in set([tuple(ires.items())
 for ires in intermediate_results])]
# step 4: add back information from deleted fields but in desired form
for ires in intermediate_results:
 for rel in relationships:
 ires[rel['label']] = set([
 res[rel['back_populates']]
 for res in results
 if res['pk'] == ires['pk']])
# done
desired_results = intermediate_results

Question 2

What is the use of 'relationships' list in getting desired result ?

Question 3

Do you have any code or any attempts at all to show us? What can we assume about the relationships and results lists, and can we assume they are totally regular, that all fields are present as shown?

Question 4

They are totally regular. All fields are present as shown.

Question 5

'label' in the relationships list gives the name of the key in the desired result. 'back_populates' gives the key of results that ends up being in the list of values in desired result

Question 6

there are too many assumptions someone would need to make about your data to post a working solution. Are all of the results set grouped by name? What happens if one of the fields is missing/is that allowed? What are the rules for keys that are not defined in relationships?

Question 7

Iterating over the groups of partial entries looks like a job for itertools.groupby.

But first lets put relationships into a format that is easier to use, prehaps a back_populates:label dictionary?

conversions = {d["back_populates"]:d['label'] for d in relationships}

Next because we will be using itertools.groupby it will need a keyfunc to distinguish between the different groups of entries. So given one entry from the initial results, this function will return a dictionary with only the pairs that will not be condensed/converted

def grouper(entry):
 #each group is identified by all key:values that are not identified in conversions
 return {k:v for k,v in entry.items() if k not in conversions}

Now we will be able to traverse the results in groups something like this:

for base_info, group in itertools.groupby(old_results, grouper):
 #base_info is dict with info unique to all entries in group
 for partial in group:
 #partial is one entry from results that will contribute to the final result
 #but wait, what do we add it too?

The only issue is that if we build our entry from base_info it will confuse groupby so we need to make an entry to work with:

entry = {new_field:set() for new_field in conversions.values()}
entry.update(base_info)

Note that I am using sets here because they are the natural container when all contence are unique, however because it is not json-compatible we will need to change them into lists at the end.

Now that we have an entry to build we can just iterate through the group to add to each new field from the original

for partial in group:
 for original, new in conversions.items():
 entry[new].add(partial[original])

then once the final entry is constructed all that is left is to convert the sets back into lists

for new in conversions.values():
 entry[new] = list(entry[new])

And that entry is done, now we can either append it to a list called new_results but since this process is essentially generating results it would make more sense to put it into a generator making the final code look something like this:

import itertools
results = [
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
 {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]
relationships = [
 {'label': 'permissions', 'back_populates': 'permission'},
 {'label': 'addresses', 'back_populates': 'address'},
]
#first we put the "relationships" in a format that is much easier to use.
conversions = {d["back_populates"]:d['label'] for d in relationships}
def grouper(entry):
 #each group is identified by all key:values that are not identified in conversions
 return {k:v for k,v in entry.items() if k not in conversions}
def parse_results(old_results, conversions=conversions):
 for base_info, group in itertools.groupby(old_results, grouper):
 entry = {new_field:set() for new_field in conversions.values()}
 entry.update(base_info)
 for partial in group: #for each entry in the original results set
 for original, new in conversions.items(): #for each field that will be condensed
 entry[new].add(partial[original])
 #convert sets back to lists so it can be put back into json
 for new in conversions.values():
 entry[new] = list(entry[new])
 yield entry

Then the new_results can be gotten like this:

>>> new_results = list(parse_results(results))
>>> from pprint import pprint #for demo purpose
>>> pprint(new_results,width=50)
[{'addresses': ['home', 'work'],
 'name': 'Joe',
 'permissions': ['admin', 'user'],
 'pk': 1},
 {'addresses': ['home'],
 'name': 'John',
 'permissions': ['user'],
 'pk': 2}]

Tadhg McDonald-Jensen 21.7k5 gold badges40 silver badges69 bronze badges · Accepted Answer · 2016-08-31 17:37:44Z

Iterating over the groups of partial entries looks like a job for itertools.groupby.

But first lets put relationships into a format that is easier to use, prehaps a back_populates:label dictionary?

conversions = {d["back_populates"]:d['label'] for d in relationships}

Next because we will be using itertools.groupby it will need a keyfunc to distinguish between the different groups of entries. So given one entry from the initial results, this function will return a dictionary with only the pairs that will not be condensed/converted

def grouper(entry):
 #each group is identified by all key:values that are not identified in conversions
 return {k:v for k,v in entry.items() if k not in conversions}

Now we will be able to traverse the results in groups something like this:

for base_info, group in itertools.groupby(old_results, grouper):
 #base_info is dict with info unique to all entries in group
 for partial in group:
 #partial is one entry from results that will contribute to the final result
 #but wait, what do we add it too?

The only issue is that if we build our entry from base_info it will confuse groupby so we need to make an entry to work with:

entry = {new_field:set() for new_field in conversions.values()}
entry.update(base_info)

Note that I am using sets here because they are the natural container when all contence are unique, however because it is not json-compatible we will need to change them into lists at the end.

Now that we have an entry to build we can just iterate through the group to add to each new field from the original

for partial in group:
 for original, new in conversions.items():
 entry[new].add(partial[original])

then once the final entry is constructed all that is left is to convert the sets back into lists

for new in conversions.values():
 entry[new] = list(entry[new])

And that entry is done, now we can either append it to a list called new_results but since this process is essentially generating results it would make more sense to put it into a generator making the final code look something like this:

import itertools
results = [
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
 {'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
 {'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]
relationships = [
 {'label': 'permissions', 'back_populates': 'permission'},
 {'label': 'addresses', 'back_populates': 'address'},
]
#first we put the "relationships" in a format that is much easier to use.
conversions = {d["back_populates"]:d['label'] for d in relationships}
def grouper(entry):
 #each group is identified by all key:values that are not identified in conversions
 return {k:v for k,v in entry.items() if k not in conversions}
def parse_results(old_results, conversions=conversions):
 for base_info, group in itertools.groupby(old_results, grouper):
 entry = {new_field:set() for new_field in conversions.values()}
 entry.update(base_info)
 for partial in group: #for each entry in the original results set
 for original, new in conversions.items(): #for each field that will be condensed
 entry[new].add(partial[original])
 #convert sets back to lists so it can be put back into json
 for new in conversions.values():
 entry[new] = list(entry[new])
 yield entry

Then the new_results can be gotten like this:

>>> new_results = list(parse_results(results))
>>> from pprint import pprint #for demo purpose
>>> pprint(new_results,width=50)
[{'addresses': ['home', 'work'],
 'name': 'Joe',
 'permissions': ['admin', 'user'],
 'pk': 1},
 {'addresses': ['home'],
 'name': 'John',
 'permissions': ['user'],
 'pk': 2}]

CollectivesTM on Stack Overflow

Turning results from SQL query into compact Python form

Update

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

Update

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related