Simplify the restructuring of json data

Question 1

I am trying to ad nested to some flat data, which is nested.

Basically this code works the following way:

"taglevel":1 tags should be key of the array
"taglevel":2 or higher tags should be nested within an array and not be duplicated in its' array
If no "taglevel":1 exists add, it to a generic "NoLevel_1" array

The code is still clunky and I feel there is a much cleaner way to achieve this.

import json
generic = []
result = []
for i in json_data:
 if any(d['taglevel'] == 1 for d in i['tag']):
 tag_data = {}
 tag_child = []
 for tag in i['tag']:
 if tag['taglevel'] == 1:
 tag_data['name'] = tag['name']
 tag_data['taglevel'] = 1
 else:
 tag_child.append(tag)
 filtered = {tuple((k, d[k]) for k in sorted(d) if k in ['name']): d for d in tag_child}
 tag_data['tag_child'] = list(filtered.values())
 if any(d['name'] == tag_data['name'] for d in result):
 for t in result:
 if t['name'] == tag_data['name']:
 t['tag_child'] = t['tag_child'] + tag_child
 filtered = {tuple((k, d[k]) for k in sorted(d) if k in ['name']): d for d in t['tag_child']}
 t['tag_child'] = list(filtered.values())
 else:
 result.append(tag_data)
 else:
 for tag in i['tag']:
 generic.append(tag)
tag_data = {}
tag_data['name'] = 'NoLevel1'
tag_data['taglevel'] = 1
tag_data['tag_child'] = generic
result.append(tag_data)
print json.dumps(result, indent=4, sort_keys=True)

The data:

json_data = [{
 "title": "Random",
 "tag": [
 {
 "name": "Fruit",
 "taglevel": 1
 },
 {
 "name": "Apple",
 "taglevel": 2
 }
 ]
 },
 {
 "title": "Other",
 "tag": [
 {
 "name": "Fruit",
 "taglevel": 1
 },
 {
 "name": "Apple",
 "taglevel": 2
 }
 ]
 },
 {
 "title": "Words",
 "tag": [
 {
 "name": "Food",
 "taglevel": 2
 }
 ]
 },
 {
 "title": "That",
 "tag": [
 {
 "name": "Food",
 "taglevel": 2
 },
 {
 "name": "Apple",
 "taglevel": 2
 }
 ]
 }
]

Desired result

[
 {
 "name": "Fruit", 
 "tag_child": [
 {
 "name": "Apple", 
 "taglevel": 2
 }
 ], 
 "taglevel": 1
 }, 
 {
 "name": "NoLevel_1", 
 "tag_child": [
 {
 "name": "Food", 
 "taglevel": 2
 }, 
 {
 "name": "Apple", 
 "taglevel": 2
 }
 ], 
 "taglevel": 1
 }
]

Question 2

Do you need to keep all that redundant information in your output or is it flexible and you can change it?

Question 3

Which parts are considered redundant? The desired result is basically how I want it. As a bonus, would be great if I could sort within the tag_child also (eg alphabetically, or if I had created, by date)

Question 4

I feel like keeping "taglevel": 1 for the first level of dictionnaries and "taglevel": 2 for the second level, for instance, is highly redundant. Also, you handle levels 1 and 2 but could there be more of them?

Question 5

No, taglevel 1 will need to be the first level always. If anything, it my be good to have an easy ability to exclude items below taglevel 3 for example

Question 6

Well you may or may not considered this a simplification but this is how I would approach it.

You could use a set to handle the duplication part.

You can't store a dict in a set though so we need to create a tuple from the values. (It looks like you're doing something similar with filtered)

We then reformat result to get the desired final structure.

from collections import defaultdict
result = defaultdict(set)
for item in json_data:
 parent = {'name': 'NoLevel_1'}
 children = []
 for tag in item['tag']:
 if tag['taglevel'] == 1:
 parent = tag
 else:
 children.append((tag['taglevel'], tag['name']))
 result[parent['name']].update(children)
result = [ 
 {
 'name': parent, 
 'tag_child': [
 {'name': name, 'taglevel': taglevel} for taglevel, name in tags
 ],
 'taglevel': 1, 
 } for parent, tags in result.items()
]

You could use next() and a list comprehension for the parent and child creation however it iterates the tags twice and may not be as "readable"

parent = next((tag for tag in item['tag'] if tag['taglevel'] == 1), {'name': 'NoLevel_1'})
children = [(tag['taglevel'], tag['name']) for tag in item['tag'] if tag['taglevel'] != 1]

Question 7

You could use a frozenset, which can be dictionary keys.

Question 8

Wow- thanks. A much more efficient and clean way to do it.

user136655 user136655 561 bronze badge · Accepted Answer · 2017-04-18 10:35:29Z

Well you may or may not considered this a simplification but this is how I would approach it.

You could use a set to handle the duplication part.

You can't store a dict in a set though so we need to create a tuple from the values. (It looks like you're doing something similar with filtered)

We then reformat result to get the desired final structure.

from collections import defaultdict
result = defaultdict(set)
for item in json_data:
 parent = {'name': 'NoLevel_1'}
 children = []
 for tag in item['tag']:
 if tag['taglevel'] == 1:
 parent = tag
 else:
 children.append((tag['taglevel'], tag['name']))
 result[parent['name']].update(children)
result = [ 
 {
 'name': parent, 
 'tag_child': [
 {'name': name, 'taglevel': taglevel} for taglevel, name in tags
 ],
 'taglevel': 1, 
 } for parent, tags in result.items()
]

You could use next() and a list comprehension for the parent and child creation however it iterates the tags twice and may not be as "readable"

parent = next((tag for tag in item['tag'] if tag['taglevel'] == 1), {'name': 'NoLevel_1'})
children = [(tag['taglevel'], tag['name']) for tag in item['tag'] if tag['taglevel'] != 1]

\$\begingroup\$ You could use a frozenset, which can be dictionary keys. \$\endgroup\$

Graipher
– Graipher

2017年04月18日 11:36:50 +00:00
Commented Apr 18, 2017 at 11:36
\$\begingroup\$ Wow- thanks. A much more efficient and clean way to do it. \$\endgroup\$

Ycon
– Ycon

2017年04月18日 13:36:58 +00:00
Commented Apr 18, 2017 at 13:36

Stack Exchange Network

Simplify the restructuring of json data

The data:

Desired result

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simplify the restructuring of json data

The data:

Desired result

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions