I have an HL7 MLLP message building class I'd like to refactor. The way HL7 over MLLP works is there are multiple fields, usually delimited by the |
character. Within the field, there can be lists, delimited by ^
, and within those lists there can be lists, delimited by &
.
So for example, the following structure:
message = ['foo',
['bar', 'baz',
['a', 'b', 'c']
],
'comment']
will be serialized to the message:
serialized = "foo|bar^baz^a&b&c|comment"
I've written a simple class whose constructor arguments indicate where in the list of fields you want to place some data, as well as the data itself. The __repr__
method is defined for easy use of these objects in f-strings.
This is the class:
class SegmentBuilder:
def __init__(self, segment_type, *args):
if not isinstance(segment_type, str):
raise ValueError("Segment type must be a string")
self.state = {0: segment_type}
for index, value in args:
self.state[index] = value
self.fields = [None for _ in range(1 + max(self.state.keys()))]
def __repr__(self):
# Do a Depth-first string construction
# Join first list using ^, second list using &
# This is ugly, but will work for now
# Could do better with an actual recursive depth-based approach
def clean(obj):
return str(obj) if obj is not None else ''
for index, value in self.state.items():
if not isinstance(value, list):
self.fields[index] = value
else:
subfields = []
for subfield in value:
if not isinstance(subfield, list):
subfields.append(clean(subfield))
else:
subsubfields = []
for subsubfield in subfield:
subsubfields.append(clean(subsubfield))
subfields.append('&'.join(subsubfields))
self.fields[index] = '^'.join(subfields)
return '|'.join([clean(field) for field in self.fields])
Now, there's clearly a recursive refactoring trying to leap out of this, probably involving some list comprehension as well. I thought it might involve passing an iterator based on the sequence of delimiter characters, eg "|^&"
, but the base case had me confused (since you would have to catch StopIteration
, and then maybe signal to the caller via returning None?). Any guidance on this recursive refactor would be very helpful!
2 Answers 2
I'm not certain I understand all of the rules, but if you have only 3 delimiters, one per level in the hierarchy, then the base case is the level number or a non-list type. Here's a sketch of one recursive implementation:
DELIMITERS = ['|', '^', '&']
def serialize(level, xs):
if isinstance(xs, list) and level < len(DELIMITERS):
return DELIMITERS[level].join(
serialize(level + 1, x)
for x in xs
)
else:
# Base case: adjust as needed.
return xs
I suspect your messages might have more variety in the leaf-node data types
(are ints and floats allowed?). If so, you might need to use return str(xs)
instead.
-
\$\begingroup\$ Brilliant! I like the exception-free approach, also how you constructed the base case conditions! \$\endgroup\$ijustlovemath– ijustlovemath2021年07月08日 02:13:04 +00:00Commented Jul 8, 2021 at 2:13
It's worth saying: I like the recursive implementation better, but it's not the only approach. Consider: you could just do one outer pass for each of the separators, like this:
from typing import Sequence, Any, Union
StringsOrSeqs = [Union[str, Sequence[Any]]]
def flatten(to_convert: StringsOrSeqs, sep: str) -> StringsOrSeqs:
for sub in to_convert:
if isinstance(sub, Sequence) and not isinstance(sub, str):
for x in sub[:-1]:
yield x
yield sep
yield sub[-1]
else:
yield sub
message = (
'foo',
(
'bar', 'baz',
(
'a', 'b', 'c',
),
),
'comment',
)
to_convert = [message]
for sep in '|^&':
to_convert = list(flatten(to_convert, sep))
serialized = ''.join(to_convert)
print(serialized)
bar
in a serialized message supposed to bebaz
? \$\endgroup\$