1

I'm looking for a method to map the content of a structured text file to a nested dictionary (dictionary tree). The text file consists of (nested) sections with each section starting with the pattern Begin $KEYWORD and ending with the pattern End $KEYWORD. An example could look like this:

Begin Section1
test1
End Section1
Begin Section2
test3
Begin Section3
test1
test2
End Section3
End Section2

I want to access the text lines corresponding to a specific section by reading the value of key "text" from a (nested) dictionary. E.g., in the example above print(sect["Section2"]["Section3"]["text"]) should produce the output ['test1', 'test2'] where sect denotes the nested dictionary. My naive coding attempt produced this:

testtxt = """
Begin Section1
test1
End Section1
Begin Section2
test3
Begin Section3
test1
test2
End Section3
End Section2
"""
testtxt = list(filter(None, testtxt.split('\n')))
# root node
sect = dict()
sect["text"] = []
# save all nodes from current node all the way up
# to the root node in a list
stack = []
for line in testtxt:
 if line.startswith("Begin "):
 # section begins with line "Begin <KEYWORD>"
 key_word = line.split("Begin ")[1].rstrip()
 sect[key_word] = dict()
 sect[key_word]["text"] = []
 # save parent node to stack in order to be able to back up to parent node
 stack.append(sect)
 # walk down to child node
 sect = sect[key_word]
 elif line.startswith("End "):
 # section ends with line "End <KEYWORD>"
 # back up to parent node
 sect = stack[-1]
 stack.pop(-1)
 else:
 # assumption: string "text" is not used as keyword
 sect["text"].append(line)

which does what I want, but it looks kind of "unpythonic". The step from parent to child node is simply sect = sect[key_word]; however, for the return path from the child up to the parent node I had to resort to the list stack, which contains all nodes from the root down to the current child's parent node. When an End KEYWORD line is found, the current node is set to the parent node taken from list stack and the corresponding entry is cleared from the list. I'd be grateful for suggestions on how to access the parent from the child node in a more elegant way (without using function recursion).

asked May 9 at 22:00
1
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. Commented May 11 at 4:19

1 Answer 1

1

Honestly, I think it's plenty Pythonic to use a stack like that. Maybe use a more descriptive name than stack, for example section_stack or parent_sections.

An alternative would be to store the parent in each section. For example:


if line.startswith('Begin '):
 # ...
 sect[key_word]['parent'] = sect 
 # ...
elif line.startswith('End '):
 # section ends with line "End <KEYWORD>"
 # back up to parent node
 sect = sect.pop('parent')
 # using sect.pop('parent') instead of sect['parent'] removes the entry

You could use a special marker key that doesn't conflict a potential section name, for example a singleton like None.

I actually don't think this is more elegant than your solution.

I would consider leaning into the stack of your solution, and not bother with sect at all.

For example:

section_stack = [{'text': []}]
for line in testtxt:
 if line.startswith("Begin "):
 # section begins with line "Begin <KEYWORD>"
 key_word = line.removeprefix("Begin ").rstrip()
 # save parent node to stack in order to be able to back up to parent node
 child_section = {'text': []}
 section_stack[-1][key_word] = child_section
 section_stack.append(child_section)
 elif line.startswith("End "):
 # section ends with line "End <KEYWORD>"
 # back up to parent node
 section_stack.pop()
 else:
 # assumption: string "text" is not used as keyword
 section_stack[-1]["text"].append(line)
# at this point len(section_stack) should be exactly 1, otherwise not all of the sections have been closed
# consider if that is acceptable
# either way, section_stack[0] contains the root node

I took the liberty of converting the more elaborate way of constructing a dictionary to a more Pythonic dictionary display, and using str.removeprefix instead of splitting a string only to throw away the created list immediately.


If there is more you want to do, you might want to use a custom type for your data. That way you can more easily implement things like accessing parent sections, checking that the keywords after each "Begin " and "End " match, not run into trouble if you want to have a section called text, and so on.

answered May 10 at 22:09

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.