I have a database schema in Postgres that looks like this (in pseudo code):
users (table):
pk (field, unique)
name (field)
permissions (table):
pk (field, unique)
permission (field, unique)
addresses (table):
pk (field, unique)
address (field, unique)
association1 (table):
user_pk (field, foreign_key)
permission_pk (field, foreign_key)
association2 (table):
user_pk (field, foreign_key)
address_pk (field, foreign_key)
Hopefully this makes intuitive sense. It's a users table that has a many-to-many relationship with a permissions table as well as a many-to-many relationship with an addresses table.
In Python, when I perform the correct SQLAlchemy query incantations, I get back results that look something like this (after converting them to a list of dictionaries in Python):
results = [
{'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
{'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
{'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
{'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
{'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]
So in this contrived example, Joe is both a user and and an admin. John is only a user. Both Joe's home and work addresses exist in the database. Only John's home address exists.
So the question is, does anybody know the best way to go from these SQL query 'results' to the more compact 'desired_results' below?
desired_results = [
{
'pk': 1,
'name': 'Joe',
'permissions': ['user', 'admin'],
'addresses': ['home', 'work']
},
{
'pk': 2,
'name': 'John',
'permissions': ['user'],
'addresses': ['home']
},
]
Additional information required: Small list of dictionaries describing the 'labels' I would like to use in the desired_results for each of the fields that have many-to-many relationships.
relationships = [
{'label': 'permissions', 'back_populates': 'permission'},
{'label': 'addresses', 'back_populates': 'address'},
]
Final consideration, I've put together a concrete example for the purposes of this question, but in general I'm trying to solve the problem of querying SQL databases in general, assuming an arbitrary amount of relationships. SQLAlchemy ORM solves this problem well, but I'm limited to using SQLAlchemy Core; so am trying to build my own solution.
Update
Here's an answer, but I'm not sure it's the best / most efficient solution. Can anyone come up with something better?
# step 1: generate set of keys that will be replaced by new keys in desired_result
back_populates = set(rel['back_populates'] for rel in relationships)
# step 2: delete from results keys generated in step 1
intermediate_results = [
{k: v for k, v in res.items() if k not in back_populates}
for res in results]
# step 3: eliminate duplicates
intermediate_results = [
dict(t)
for t in set([tuple(ires.items())
for ires in intermediate_results])]
# step 4: add back information from deleted fields but in desired form
for ires in intermediate_results:
for rel in relationships:
ires[rel['label']] = set([
res[rel['back_populates']]
for res in results
if res['pk'] == ires['pk']])
# done
desired_results = intermediate_results
1 Answer 1
Iterating over the groups of partial entries looks like a job for itertools.groupby.
But first lets put relationships into a format that is easier to use, prehaps a back_populates:label dictionary?
conversions = {d["back_populates"]:d['label'] for d in relationships}
Next because we will be using itertools.groupby it will need a keyfunc to distinguish between the different groups of entries.
So given one entry from the initial results, this function will return a dictionary with only the pairs that will not be condensed/converted
def grouper(entry):
#each group is identified by all key:values that are not identified in conversions
return {k:v for k,v in entry.items() if k not in conversions}
Now we will be able to traverse the results in groups something like this:
for base_info, group in itertools.groupby(old_results, grouper):
#base_info is dict with info unique to all entries in group
for partial in group:
#partial is one entry from results that will contribute to the final result
#but wait, what do we add it too?
The only issue is that if we build our entry from base_info it will confuse groupby so we need to make an entry to work with:
entry = {new_field:set() for new_field in conversions.values()}
entry.update(base_info)
Note that I am using sets here because they are the natural container when all contence are unique,
however because it is not json-compatible we will need to change them into lists at the end.
Now that we have an entry to build we can just iterate through the group to add to each new field from the original
for partial in group:
for original, new in conversions.items():
entry[new].add(partial[original])
then once the final entry is constructed all that is left is to convert the sets back into lists
for new in conversions.values():
entry[new] = list(entry[new])
And that entry is done, now we can either append it to a list called new_results but since this process is essentially generating results it would make more sense to put it into a generator
making the final code look something like this:
import itertools
results = [
{'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'home'},
{'pk': 1, 'name': 'Joe', 'permission': 'user', 'address': 'work'},
{'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'home'},
{'pk': 1, 'name': 'Joe', 'permission': 'admin', 'address': 'work'},
{'pk': 2, 'name': 'John', 'permission': 'user', 'address': 'home'},
]
relationships = [
{'label': 'permissions', 'back_populates': 'permission'},
{'label': 'addresses', 'back_populates': 'address'},
]
#first we put the "relationships" in a format that is much easier to use.
conversions = {d["back_populates"]:d['label'] for d in relationships}
def grouper(entry):
#each group is identified by all key:values that are not identified in conversions
return {k:v for k,v in entry.items() if k not in conversions}
def parse_results(old_results, conversions=conversions):
for base_info, group in itertools.groupby(old_results, grouper):
entry = {new_field:set() for new_field in conversions.values()}
entry.update(base_info)
for partial in group: #for each entry in the original results set
for original, new in conversions.items(): #for each field that will be condensed
entry[new].add(partial[original])
#convert sets back to lists so it can be put back into json
for new in conversions.values():
entry[new] = list(entry[new])
yield entry
Then the new_results can be gotten like this:
>>> new_results = list(parse_results(results))
>>> from pprint import pprint #for demo purpose
>>> pprint(new_results,width=50)
[{'addresses': ['home', 'work'],
'name': 'Joe',
'permissions': ['admin', 'user'],
'pk': 1},
{'addresses': ['home'],
'name': 'John',
'permissions': ['user'],
'pk': 2}]
relationshipsandresultslists, and can we assume they are totally regular, that all fields are present as shown?relationships?