Let's say I have a data collection, like a list, containing a set of objects with particular properties. Let's go with animals:
[ cow, sheep, orangutan ]
An animal has a property animal_data
that contains taxonomic information, like kingdom
, class
, family
and species
. This establishes a hierarchy which says that each property is contained in the previous one, sort of like a linear many-to-one tree.
Now we want to rearrange the former collection into a data structure that groups each animal into their own species, family, class and kingdom. We'd end up with something like this:
{
"kingdoms": [
{
"name": "Animalia",
"classes": [
{
"name": "Mammalia",
"families": [
{
"name": "Bovidae",
"species": [
{
"name": "Bos taurus"
},
{
"name": "Bovis aries"
}
]
},
{
"name": "Hominidae",
"species": [
{
"name": "Pongo pygmaeus"
}
]
}
]
}
]
}
]
}
This would be our final data structure. And this is strictly how it should look like, I can't rearrange it to make it look better. I know that could be done, but that's just how it has to look like.
Now, being relatively new to Python, or at least to its functional potential, I tried using list comprehensions, map
, groupby
and lambdas to achieve that result. However, I couldn't get past the first level of nesting, because each group is dependant on the one on the higher level.
So this is my solution instead:
# group by kingdom
animals_dict = {kingdom: list(animals_by_kingdom) for kingdom, animals_by_kingdom in
groupby(animals, lambda a: a.animal_data.kingdom)}
grouped_animals = defaultdict(list)
for kingdom, animals_by_kingdom in animals_dict.items():
# group by class
classes_dict = {animal_class: list(animals_by_class) for animal_class, animals_by_class in
groupby(animals_by_kingdom, lambda a: a.animal_data.animal_class)}
classes = []
for animal_class, animals_by_class in classes_dict.items():
# group by family
families_dict = {family: list(animals_by_family) for family, animals_by_family in
groupby(animals_by_class, lambda a: a.animal_data.family)}
families = []
for family, animals_by_family in families_dict.items():
families.append(
{"name": family, "species": [{"name": animal.animal_data.species} for animal in animals_by_family]})
classes.append({"name": animal_class, "families": families})
grouped_animals["kingdoms"].append({"name": kingdom, "classes": classes})
This is the best I could do, but something tells me Python has a potential that'd allow me to do this more elegantly, compressed and clear.
I'd really appreaciate if any of you could give me tips in how to enhance my code and how to use Python tools to do it more properly and clearly (if it can indeed be done better).
Disclaimers:
- I cannot modify the initial data structure. If the
animal_data
property seems weird (instead of having thekingdom
,class
, etc. attached to the animal directly), that's just how it is. - If you're wondering why would I rearrange the list in such a way, it is so it can be easily consumed by an endpoint that would work better with that format.
In case you need the Animal
and AnimalCode
to fiddle with this demo, here it is:
class AnimalData:
def __init__(self, kingdom, animal_class, family, species):
super().__init__()
self.kingdom = kingdom
self.animal_class = animal_class
self.family = family
self.species = species
def __str__(self, *args, **kwargs):
return "Kingdom=%s, Class=%s, Family=%s, Species=%s" % (
self.kingdom, self.animal_class, self.family, self.species)
class Animal:
def __init__(self, kingdom, animal_class, family, species):
super().__init__()
self.animal_data = AnimalData(kingdom, animal_class, family, species)
def __str__(self, *args, **kwargs):
return str(self.animal_data)
cow = Animal("Animalia", "Mammalia", "Bovidae", "Bos taurus")
sheep = Animal("Animalia", "Mammalia", "Bovidae", "Bovis aries")
orangutan = Animal("Animalia", "Mammalia", "Hominidae", "Pongo pygmaeus")
animals = [cow, sheep, orangutan]
Note: The code works in both Python 2 and Python 3.
1 Answer 1
You should change Animal
to be a child of AnimalData
, or just not exist.
This allows you to remove all the animal_data
boilerplate from say animal.animal_data.kingdom
.
After this lets say that you can output the following data structure:
{
"Animalia": {
"Mammalia": {
"Bovidae": [
"Bos taurus"
]
}
}
}
I know you can't. But using that we could simplify all your code. It'd simply become:
kingdoms = {}
kingdom = kingdoms.getdefault(animal.kingdom, {})
animal_class = kingdom.getdefault(animal.animal_class, {})
family = animal_class.getdefault(animal.family, [])
family.append(animal.species)
Or, a less ugly way:
kingdoms = {}
(kingdoms
.getdefault(animal.kingdom, {})
.getdefault(animal.animal_class, {})
.getdefault(animal.family, [])
.append(animal.species))
And so I'd either suggest that you change the data from the one that I've shown above to the one that you need.
Or make your own getdefault
, that will allow you to do roughly the same.
In this you want to filter the list to get the first, and hopefully only, item with that key in the list.
If it doesn't exist, you want to make it your self, and for it to follow the structure you need.
To get a 'one to one' transform of get_default
can lead to:
def get_default(list, key, value):
v = next((i for i in list if i['name'] == key), None)
if v is None:
v = {
'name': key,
'value': value
}
list.append(v)
return v['value']
However you don't really need to add the default value
, so you can just set it to []
. And you need to change the 'value'
to something you can pass in.
Making these changes to the above can allow you to get:
def get_default(list, key, value):
v = next((i for i in list if i['name'] == key), None)
if v is None:
v = {
'name': key,
value: []
}
list.append(v)
return v[value]
And then you'd need to run this function.
kingdoms = []
kingdom = get_default(kingdoms, animal.kingdom, 'classes')
animal_class = get_default(kingdom, animal.animal_class, 'families')
family = get_default(animal_class, animal.family, 'species')
family.append({'name': animal.species})
But this isn't exactly what you want, and so we'd need to loop through each animal, and change the output to be correct.
def structure_animals(animals):
kingdoms = []
for animal in animals:
kingdom = get_default(kingdoms, animal.kingdom, 'classes')
animal_class = get_default(kingdom, animal.animal_class, 'families')
family = get_default(animal_class, animal.family, 'species')
family.append({'name': animal.species})
return {'kingdoms': kingdoms}
The entire code changes I made:
import json
class AnimalData:
def __init__(self, kingdom, animal_class, family, species):
super().__init__()
self.kingdom = kingdom
self.animal_class = animal_class
self.family = family
self.species = species
def __str__(self, *args, **kwargs):
return "Kingdom=%s, Class=%s, Family=%s, Species=%s" % (
self.kingdom, self.animal_class, self.family, self.species)
class Animal(AnimalData):
pass
def get_default(list, key, value):
v = next((i for i in list if i['name'] == key), None)
if v is None:
v = {
'name': key,
value: []
}
list.append(v)
return v[value]
def structure_animals(animals):
kingdoms = []
for animal in animals:
kingdom = get_default(kingdoms, animal.kingdom, 'classes')
animal_class = get_default(kingdom, animal.animal_class, 'families')
family = get_default(animal_class, animal.family, 'species')
family.append({'name': animal.species})
return {'kingdoms': kingdoms}
cow = Animal("Animalia", "Mammalia", "Bovidae", "Bos taurus")
sheep = Animal("Animalia", "Mammalia", "Bovidae", "Bovis aries")
orangutan = Animal("Animalia", "Mammalia", "Hominidae", "Pongo pygmaeus")
animals = [cow, sheep, orangutan]
print(json.dumps(structure_animals(animals), sort_keys=True, indent=2))
-
\$\begingroup\$ As I said in my disclaimer I could NOT change the
Animal
orAnimalData
classes and their relationships. Also the resulting data structure I showed as an example was strictly how I needed it. Placing the taxonomy element name in a"name"
value instead of as the key was done in purpose, basically because there will be more properties, not just the name, and I'd want them on the same level instead of one as a key and the others as values. Also, going"family": [ "species1, "species2" ]
doesn't work because of the same reasons. I'd need dicts, not plain strings. \$\endgroup\$dabadaba– dabadaba2016年09月19日 11:31:43 +00:00Commented Sep 19, 2016 at 11:31 -
\$\begingroup\$ @dabadaba Ok, ignore that one point. Also you did read my answer right? It outputs the same as yours. \$\endgroup\$2016年09月19日 11:35:52 +00:00Commented Sep 19, 2016 at 11:35
-
\$\begingroup\$ Yes of course I read it. I was just commenting on your first point. Just one slight change... you took advantage of the fact that all the values are
"name"
, but what if we wanted a different key? Or more than just one value? For example for the class you'd want"title"
instead of"name"
and an extra"description"
attribute. \$\endgroup\$dabadaba– dabadaba2016年09月19日 11:38:58 +00:00Commented Sep 19, 2016 at 11:38 -
\$\begingroup\$ @dabadaba How did you handle those things in your question? You didn't. Also this is starting to seem very much like example code. But to answer your question, you'd add a variable to
get_default
to change thename
. And to add a description, you can return both the dict and the list, or do it later. \$\endgroup\$2016年09月19日 11:42:41 +00:00Commented Sep 19, 2016 at 11:42 -
\$\begingroup\$ Right. I guess I'll post a new, more accurate question. \$\endgroup\$dabadaba– dabadaba2016年09月19日 11:45:39 +00:00Commented Sep 19, 2016 at 11:45