I have parse function which is parsing tree of categories. I've written it in simplest way possible and now struggling with refactoring it.
Every nested loop is doing the same stuff but appending object to object childs initialized at the top.
I think it's possible to refactor it with recursion but I'm struggling with it. How to wrap it in recursion function to prevent code duplication?
Final result should be a list of objects or just yield top level object with nested childs.
for container in category_containers:
root_category_a = container.xpath("./a")
root_category_title = root_category_a.xpath("./*[1]/text()").get()
root_category_url = self._host + root_category_a.xpath("./@href").get()
root = {
"title": root_category_title,
"url": root_category_url,
"childs": [],
}
subcategory_rows1 = container.xpath("./div/div")
for subcat_row1 in subcategory_rows1:
subcategory_a = subcat_row1.xpath("./a")
subcategory_title = subcategory_a.xpath("./*[1]/text()").get()
subcategory_url = self._host + subcategory_a.xpath("./@href").get()
subcat1 = {
"title": subcategory_title,
"url": subcategory_url,
"childs": [],
}
subcategory_rows2 = subcat_row1.xpath("./div/div")
for subcat_row2 in subcategory_rows2:
subcategory2_a = subcat_row2.xpath("./a")
subcategory2_title = subcategory2_a.xpath("./*[1]/text()").get()
subcategory2_url = self._host + subcategory2_a.xpath("./@href").get()
subcat2 = {
"title": subcategory2_title,
"url": subcategory2_url,
"childs": [],
}
subcategory_rows3 = subcat_row2.xpath("./div/div")
for subcat_row3 in subcategory_rows3:
subcategory3_a = subcat_row3.xpath("./a")
subcategory3_title = subcategory3_a.xpath("./*[1]/text()").get()
subcategory3_url = self._host + subcategory3_a.xpath("./@href").get()
subcat3 = {
"title": subcategory3_title,
"url": subcategory3_url,
"childs": [],
}
subcat2['childs'].append(subcat3)
subcat1['childs'].append(subcat2)
root['childs'].append(subcat1)
yield root
1 Answer 1
You can start by just extracting the bit that clearly is repeated into a standalone function:
def get_category(category) -> dict[str, Any]:
category_a = category.xpath("./a")
category_title = category_a.xpath("./*[1]/text()").get()
category_url = self._host + category_a.xpath("./@href").get()
return {
"title": category_title,
"url": category_url,
"childs": [],
}
and then turn your code into something like:
for container in category_containers:
root = get_category(container)
for subcontainer in container.xpath("./div/div"):
subcategory = get_category(subcontainer)
for subsubcontainer in subcontainer.xpath("./div/div"):
subsubcategory = get_category(subsubcontainer)
for subsubsubcontainer in subcontainer.xpath("./div/div"):
subsubsubcategory = get_category(subsubsubcontainer)
subsubcategory["childs"].append(subsubsubcategory)
subcategory["childs"].append(subsubcategory)
root["childs"].append(subcategory)
yield root
For recursion to work with this problem you're going to need to define the maximum depth somehow - I think this might work, but also otherwise I think you'll get the jist of it:
def recurse_categories(container, depth = 0):
if depth > 2:
return None
category = get_category(container)
subcategory = recurse_categories(category, depth + 1)
if subcategory is not None:
category["childs"].append(subcategory)
category_containers
\$\endgroup\$