Not a professional coder, but my current hospital position is allowing me to code some workarounds for our abysmal EHR system. The following is part of my Python project to visualize laboratory data.
The following code snippet works, but I haven't figured out a way with list comprehensions or np.where
to streamline it and potentially improve performance. I haven't found any posts yet to answer this, including among those suggested during question submission.
This code tests tuple[0] in a list of tuples (test_and_value
) comprising (lab test names, test values), searching for misspelled lab test names and replacing them with standardized lab test names.
Common misspellings and their replacement terms live in another list of tuples (chem_fix) to preserve the order in which they are checked. Some word fragments of misspellings are used in lieu of full words, therefore testing cannot be a simple ==
statement.
Final product is a list of tuples, test_and_value_f
, with correctly spelled lab tests.
Pertinent code excerpts:
chem_fix = [('A/G RATIO', 'A:G RATIO'), ('ALK', 'ALKALINE PHOSPHATASE'), ('ALT', 'ALT'), ('AST', 'AST'), ('BILI', 'BILIRUBIN,TOTAL'), ('BLIL', 'BILIRUBIN,TOTAL'), ('BUN/CREAT', 'BUN/CREAT RATIO'), ('BUN', 'BLOOD UREA NITROGEN'), ('CARBON', 'CO2'), ('GLOB', 'GLOBULIN'), ('RANDOM', 'GLUCOSE'), ('PROTEIN', 'TOTAL PROTEIN')]
fix_terms = [x[0] for x in chem_fix]
test_and_value_f = []
replaced = False
for lab_test, value in test_and_value:
for count, term_needing_fix in enumerate(fix_terms):
if term_needing_fix in lab_test:
test_and_value_f.append((chem_fix[count][1], value))
replaced = True
break
if replaced == False:
test_and_value_f.append((lab_test, value))
else:
replaced = False
Sample test_and_value
input: [('GLUCOSE', '77'), ('BUN', '14'), ('CREATININE', '1.4'), ('CALCULATED BUN/CREAT', '10'), ('SODIUM', '142'), ('POTASSIUM', '3.7'), ('CHLORIDE', '100'), ('CARBON DIOXIDE', '30'), ('CALCIUM', '8.9'), ('PROTEIN, TOTAL', '6.5'), ('ALBUMIN', '3.4'), ('CALCULATED GLOBIN', '3.1'), ('CALCULATED A/G RATIO', '1.1'), ('BILIRUBIN, TOTAL', '0.7'), ('ALKALINE PHOSPHATASE', '59'), ('SGOT (AST)', '3')]
Sample test_and_value_f
output: [('GLUCOSE', '77'), ('BLOOD UREA NITROGEN', '14'), ('CREATININE', '1.4'), ('BUN/CREAT RATIO', '10'), ('SODIUM', '142'), ('POTASSIUM', '3.7'), ('CHLORIDE', '100'), ('CO2', '30'), ('CALCIUM', '8.9'), ('TOTAL PROTEIN', '6.5'), ('ALBUMIN', '3.4'), ('GLOBULIN', '3.1'), ('A:G RATIO', '1.1'), ('BILIRUBIN,TOTAL', '0.7'), ('ALKALINE PHOSPHATASE', '59'), ('AST', '3')]
Is there a more efficient way to do this?
1 Answer 1
You could use a dictionary instead of a list of tuples for chem_fix
, and then you could get rid of fix_terms. Then you could use search
from the re module in the list comprehension and then look up any matches in the dictionary. And you can use walrus operator to store whatever is returned from search
since first you have to make sure it’s not None
. Here’s an example of that:
# there’s already a function called "compile"
# "import re" used so it doesn’t get overwritten
pattern = re.compile('|'.join(chem_fix))
pattern_search = pattern.search # method will be looked up faster
test_and_value_f = [
(
chem_fix[m.group(0)]
if (m := pattern_search(lab_test)) is not None
else lab_test,
value
)
for lab_test, value in test_and_value
]
-
\$\begingroup\$ Chose list of tuples to retain order in which checking occurs. Do I need to use an ordered dict, or will the order be retained in a regular dict (running Python 3.9) Was reading 3.6+ is supposed to retain insertion order? link \$\endgroup\$horseshrink– horseshrink2021年01月23日 04:51:57 +00:00Commented Jan 23, 2021 at 4:51
-
2\$\begingroup\$ Yeah a regular dict should keep its insertion order if it’s 3.9. So the
"|".join
will return a string with the terms in the same order as what’s in the dictionary, so you shouldn’t have to write the pattern manually. \$\endgroup\$my_stack_exchange_account– my_stack_exchange_account2021年01月23日 05:02:16 +00:00Commented Jan 23, 2021 at 5:02 -
1\$\begingroup\$ Since
value
is the same in both branches, yourelse
could apply on the inside of the output tuple to the first element only. \$\endgroup\$Reinderien– Reinderien2021年01月23日 17:28:14 +00:00Commented Jan 23, 2021 at 17:28 -
\$\begingroup\$ Had to manually deconstruct this with some test code to understand how everything fit together. Learned: 1) how to use if/else within a list comprehension to determine the values of tuple[0] within a list of tuples 2) .join on a dict joins the keys 3) walrus operator \$\endgroup\$horseshrink– horseshrink2021年01月23日 20:23:51 +00:00Commented Jan 23, 2021 at 20:23
test_and_value
) and the expected output? I think it will help reviewers \$\endgroup\$