Parse backslash-delimited hierarchy path into separate columns

Question 1

I have rows in a database table with a hierarchypath column. The values in the column are hierarchy paths with up to four levels:

HIERARCHYPATH 
---------------------------------------------------
FACILITIES \ FIREHALL \ PLUMBING
FACILITIES \ PARK
ROADS \ GRASS/TURF BLVD MAINTENANCE
ROADS \ SIDEWALKS \ REPLACEMENT
FACILITIES \ PARKING - MAIN ST
RECREATION \ BANDSHELL \ PROPERTY \ BUILDING-GENERAL
FACILITIES

I've written a Jython 2.7 automation script that parses the levels at the \ delimiter and inserts the values into individual level columns:

CLASSL1
CLASSL2
CLASSL3
CLASSL4

#Example: 
s = "FACILITIES \ FIREHALL \ PLUMBING"
col_prefix = "CLASSL"
#Note: The term "mbo" is an IBM Maximo concept. 
#It can be thought of as the currently selected record in the application.
for i in range(0, 4):
 try:
 mbo.setValue(col_prefix + str(i+1), s.split(' \\ ')[i])
 except:
 #Null-out any existing values that were not overriden
 mbo.setValueNull(col_prefix + str(i+1))
HIERARCHYPATH CLASSL1 CLASSL2 CLASSL3 CLASSL4
-------------------------------- ---------- ---------- ---------- ----------- 
FACILITIES \ FIREHALL \ PLUMBING FACILITIES FIREHALL PLUMBING null

I'm relatively new to coding. How can the script be improved?

Question 2

Why Jython? It's pretty old and unsupported.

Question 3

@Reinderien Unfortunately, I don't have a choice. Jython is the standard scripting language for IBM Maximo. It's the only viable way to interact with the Java classes.

Question 4

Normalization

A fully-normalized schema would not have four path component columns in a table; it would have one table with a string column for the path component, and a foreign key to its parent (and/or child), conceptually similar to a linked list. Advantages to this approach would be support for an arbitrarily long path. Disadvantage is likely decreased performance, though the scale of your problem is not clear.

Otherwise, I find it odd that you're parsing a path, and storing the result of the parse back into the database alongside the original data. This is a fast-enough operation that surely any advantage to caching parse results is outweighed by the disadvantage of increased storage and fetch time.

The only solid justification I can think of for such a thing is when (1) you know that certain interesting strings exist in fixed locations in the path, and (2) you need very fast, indexed querying on them, and (3) such querying is too slow when doing the usual pattern-matching approach on the original path.

Iteration

Don't use range for your loop; use enumerate. Also use a raw string due to your backslash.

for i, part in enumerate(s.split(r' \ '), 1):

Bare `except`

Generally this is a bad idea. At the least, catch Exception instead, or more preferably the actual exception type you expect to see. If it's only an index error because you're trying to index into a split that has fewer than four components, just avoid that situation altogether:

parts = s.split(r' \ ')
for i, part in enumerate(parts, 1):
 mbo.setValue(col_prefix + str(i), part)
for i in range(len(parts)+1, 5):
 mbo.setValueNull(col_prefix + str(i))

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2020-10-30 16:11:51Z

Normalization

A fully-normalized schema would not have four path component columns in a table; it would have one table with a string column for the path component, and a foreign key to its parent (and/or child), conceptually similar to a linked list. Advantages to this approach would be support for an arbitrarily long path. Disadvantage is likely decreased performance, though the scale of your problem is not clear.

Otherwise, I find it odd that you're parsing a path, and storing the result of the parse back into the database alongside the original data. This is a fast-enough operation that surely any advantage to caching parse results is outweighed by the disadvantage of increased storage and fetch time.

The only solid justification I can think of for such a thing is when (1) you know that certain interesting strings exist in fixed locations in the path, and (2) you need very fast, indexed querying on them, and (3) such querying is too slow when doing the usual pattern-matching approach on the original path.

Iteration

Don't use range for your loop; use enumerate. Also use a raw string due to your backslash.

for i, part in enumerate(s.split(r' \ '), 1):

Bare `except`

Generally this is a bad idea. At the least, catch Exception instead, or more preferably the actual exception type you expect to see. If it's only an index error because you're trying to index into a split that has fewer than four components, just avoid that situation altogether:

parts = s.split(r' \ ')
for i, part in enumerate(parts, 1):
 mbo.setValue(col_prefix + str(i), part)
for i in range(len(parts)+1, 5):
 mbo.setValueNull(col_prefix + str(i))

Stack Exchange Network

Parse backslash-delimited hierarchy path into separate columns

1 Answer 1

Normalization

Iteration

Bare `except`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parse backslash-delimited hierarchy path into separate columns

1 Answer 1

Normalization

Iteration

Bare except

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Bare `except`