1
\$\begingroup\$

I have parse function which is parsing tree of categories. I've written it in simplest way possible and now struggling with refactoring it.

Every nested loop is doing the same stuff but appending object to object childs initialized at the top.

I think it's possible to refactor it with recursion but I'm struggling with it. How to wrap it in recursion function to prevent code duplication?

Final result should be a list of objects or just yield top level object with nested childs.

for container in category_containers:
 root_category_a = container.xpath("./a")
 root_category_title = root_category_a.xpath("./*[1]/text()").get()
 root_category_url = self._host + root_category_a.xpath("./@href").get()
 root = {
 "title": root_category_title,
 "url": root_category_url,
 "childs": [],
 }
 subcategory_rows1 = container.xpath("./div/div")
 for subcat_row1 in subcategory_rows1:
 subcategory_a = subcat_row1.xpath("./a")
 subcategory_title = subcategory_a.xpath("./*[1]/text()").get()
 subcategory_url = self._host + subcategory_a.xpath("./@href").get()
 subcat1 = {
 "title": subcategory_title,
 "url": subcategory_url,
 "childs": [],
 }
 subcategory_rows2 = subcat_row1.xpath("./div/div")
 for subcat_row2 in subcategory_rows2:
 subcategory2_a = subcat_row2.xpath("./a")
 subcategory2_title = subcategory2_a.xpath("./*[1]/text()").get()
 subcategory2_url = self._host + subcategory2_a.xpath("./@href").get()
 subcat2 = {
 "title": subcategory2_title,
 "url": subcategory2_url,
 "childs": [],
 }
 subcategory_rows3 = subcat_row2.xpath("./div/div")
 for subcat_row3 in subcategory_rows3:
 subcategory3_a = subcat_row3.xpath("./a")
 subcategory3_title = subcategory3_a.xpath("./*[1]/text()").get()
 subcategory3_url = self._host + subcategory3_a.xpath("./@href").get()
 subcat3 = {
 "title": subcategory3_title,
 "url": subcategory3_url,
 "childs": [],
 }
 subcat2['childs'].append(subcat3)
 subcat1['childs'].append(subcat2)
 root['childs'].append(subcat1)
 yield root
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Aug 31, 2022 at 10:43
\$\endgroup\$
6
  • \$\begingroup\$ It sounds like you're asking for help to refactor your code. That would imply that it's not yet finished to your satisfaction, so not yet ready for review. \$\endgroup\$ Commented Aug 31, 2022 at 12:08
  • \$\begingroup\$ @TobySpeight if it only needs refactoring, the typical meaning of refactoring implies a codebase that is complete but should be restructured for quality. That's what CodeReview is for, right? \$\endgroup\$ Commented Aug 31, 2022 at 12:34
  • \$\begingroup\$ That said, this question needs more context, particularly the full code that populates category_containers \$\endgroup\$ Commented Aug 31, 2022 at 12:36
  • \$\begingroup\$ @Reinderien, AFAIR, Code Review Meta consensus was that requests to rewrite code in a different paradigm are off-topic. Sorry I'm not in a position to hunt out that meta question right now. \$\endgroup\$ Commented Aug 31, 2022 at 12:41
  • 1
    \$\begingroup\$ Also, this question needs an edit so that the title and description summarise the purpose of the code, rather than its mechanism. I recommend including some sample input to show what you're extracting. We really need to understand the motivational context to give good reviews. Thanks! \$\endgroup\$ Commented Aug 31, 2022 at 12:46

1 Answer 1

4
\$\begingroup\$

You can start by just extracting the bit that clearly is repeated into a standalone function:

def get_category(category) -> dict[str, Any]:
 category_a = category.xpath("./a")
 category_title = category_a.xpath("./*[1]/text()").get()
 category_url = self._host + category_a.xpath("./@href").get()
 return {
 "title": category_title,
 "url": category_url,
 "childs": [],
 }

and then turn your code into something like:

for container in category_containers:
 root = get_category(container)
 for subcontainer in container.xpath("./div/div"):
 subcategory = get_category(subcontainer)
 for subsubcontainer in subcontainer.xpath("./div/div"):
 subsubcategory = get_category(subsubcontainer)
 for subsubsubcontainer in subcontainer.xpath("./div/div"):
 subsubsubcategory = get_category(subsubsubcontainer)
 subsubcategory["childs"].append(subsubsubcategory)
 subcategory["childs"].append(subsubcategory)
 root["childs"].append(subcategory)
 yield root

For recursion to work with this problem you're going to need to define the maximum depth somehow - I think this might work, but also otherwise I think you'll get the jist of it:

def recurse_categories(container, depth = 0):
 if depth > 2:
 return None
 category = get_category(container)
 subcategory = recurse_categories(category, depth + 1)
 if subcategory is not None:
 category["childs"].append(subcategory)
answered Aug 31, 2022 at 11:17
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.