Find and replace strings in the first column of data

Question 1

Not a professional coder, but my current hospital position is allowing me to code some workarounds for our abysmal EHR system. The following is part of my Python project to visualize laboratory data.

The following code snippet works, but I haven't figured out a way with list comprehensions or np.where to streamline it and potentially improve performance. I haven't found any posts yet to answer this, including among those suggested during question submission.

This code tests tuple[0] in a list of tuples (test_and_value) comprising (lab test names, test values), searching for misspelled lab test names and replacing them with standardized lab test names.

Common misspellings and their replacement terms live in another list of tuples (chem_fix) to preserve the order in which they are checked. Some word fragments of misspellings are used in lieu of full words, therefore testing cannot be a simple == statement.

Final product is a list of tuples, test_and_value_f, with correctly spelled lab tests.

Pertinent code excerpts:

chem_fix = [('A/G RATIO', 'A:G RATIO'), ('ALK', 'ALKALINE PHOSPHATASE'), ('ALT', 'ALT'), ('AST', 'AST'), ('BILI', 'BILIRUBIN,TOTAL'), ('BLIL', 'BILIRUBIN,TOTAL'), ('BUN/CREAT', 'BUN/CREAT RATIO'), ('BUN', 'BLOOD UREA NITROGEN'), ('CARBON', 'CO2'), ('GLOB', 'GLOBULIN'), ('RANDOM', 'GLUCOSE'), ('PROTEIN', 'TOTAL PROTEIN')]
fix_terms = [x[0] for x in chem_fix]
test_and_value_f = []
replaced = False
for lab_test, value in test_and_value:
 for count, term_needing_fix in enumerate(fix_terms):
 if term_needing_fix in lab_test:
 test_and_value_f.append((chem_fix[count][1], value))
 replaced = True
 break
 if replaced == False:
 test_and_value_f.append((lab_test, value))
 else:
 replaced = False

Sample test_and_value input: [('GLUCOSE', '77'), ('BUN', '14'), ('CREATININE', '1.4'), ('CALCULATED BUN/CREAT', '10'), ('SODIUM', '142'), ('POTASSIUM', '3.7'), ('CHLORIDE', '100'), ('CARBON DIOXIDE', '30'), ('CALCIUM', '8.9'), ('PROTEIN, TOTAL', '6.5'), ('ALBUMIN', '3.4'), ('CALCULATED GLOBIN', '3.1'), ('CALCULATED A/G RATIO', '1.1'), ('BILIRUBIN, TOTAL', '0.7'), ('ALKALINE PHOSPHATASE', '59'), ('SGOT (AST)', '3')]

Sample test_and_value_f output: [('GLUCOSE', '77'), ('BLOOD UREA NITROGEN', '14'), ('CREATININE', '1.4'), ('BUN/CREAT RATIO', '10'), ('SODIUM', '142'), ('POTASSIUM', '3.7'), ('CHLORIDE', '100'), ('CO2', '30'), ('CALCIUM', '8.9'), ('TOTAL PROTEIN', '6.5'), ('ALBUMIN', '3.4'), ('GLOBULIN', '3.1'), ('A:G RATIO', '1.1'), ('BILIRUBIN,TOTAL', '0.7'), ('ALKALINE PHOSPHATASE', '59'), ('AST', '3')]

Is there a more efficient way to do this?

Question 2

(One user suggested adding tag python-3.x.)

Question 3

Can you provide an example of the input (test_and_value) and the expected output? I think it will help reviewers

Question 4

You could use a dictionary instead of a list of tuples for chem_fix, and then you could get rid of fix_terms. Then you could use search from the re module in the list comprehension and then look up any matches in the dictionary. And you can use walrus operator to store whatever is returned from search since first you have to make sure it’s not None. Here’s an example of that:

# there’s already a function called "compile"
# "import re" used so it doesn’t get overwritten
pattern = re.compile('|'.join(chem_fix))
pattern_search = pattern.search # method will be looked up faster
test_and_value_f = [
 (
 chem_fix[m.group(0)]
 if (m := pattern_search(lab_test)) is not None
 else lab_test,
 value
 )
 for lab_test, value in test_and_value
]

Question 5

Chose list of tuples to retain order in which checking occurs. Do I need to use an ordered dict, or will the order be retained in a regular dict (running Python 3.9) Was reading 3.6+ is supposed to retain insertion order? link

Question 6

Yeah a regular dict should keep its insertion order if it’s 3.9. So the "|".join will return a string with the terms in the same order as what’s in the dictionary, so you shouldn’t have to write the pattern manually.

Question 7

Since value is the same in both branches, your else could apply on the inside of the output tuple to the first element only.

Question 8

Had to manually deconstruct this with some test code to understand how everything fit together. Learned: 1) how to use if/else within a list comprehension to determine the values of tuple[0] within a list of tuples 2) .join on a dict joins the keys 3) walrus operator

score 1 · Accepted Answer · 2021-01-23 00:20:22Z

1

\$\begingroup\$

You could use a dictionary instead of a list of tuples for chem_fix, and then you could get rid of fix_terms. Then you could use search from the re module in the list comprehension and then look up any matches in the dictionary. And you can use walrus operator to store whatever is returned from search since first you have to make sure it’s not None. Here’s an example of that:

# there’s already a function called "compile"
# "import re" used so it doesn’t get overwritten
pattern = re.compile('|'.join(chem_fix))
pattern_search = pattern.search # method will be looked up faster
test_and_value_f = [
 (
 chem_fix[m.group(0)]
 if (m := pattern_search(lab_test)) is not None
 else lab_test,
 value
 )
 for lab_test, value in test_and_value
]

Share

edited Jan 23, 2021 at 18:32

answered Jan 23, 2021 at 0:20

my_stack_exchange_account's user avatar

my_stack_exchange_account my_stack_exchange_account

9226 silver badges16 bronze badges

\$\endgroup\$

4

\$\begingroup\$ Chose list of tuples to retain order in which checking occurs. Do I need to use an ordered dict, or will the order be retained in a regular dict (running Python 3.9) Was reading 3.6+ is supposed to retain insertion order? link \$\endgroup\$

horseshrink
– horseshrink

2021年01月23日 04:51:57 +00:00
Commented Jan 23, 2021 at 4:51
2

\$\begingroup\$ Yeah a regular dict should keep its insertion order if it’s 3.9. So the "|".join will return a string with the terms in the same order as what’s in the dictionary, so you shouldn’t have to write the pattern manually. \$\endgroup\$

my_stack_exchange_account
– my_stack_exchange_account

2021年01月23日 05:02:16 +00:00
Commented Jan 23, 2021 at 5:02
1

\$\begingroup\$ Since value is the same in both branches, your else could apply on the inside of the output tuple to the first element only. \$\endgroup\$

Reinderien
– Reinderien

2021年01月23日 17:28:14 +00:00
Commented Jan 23, 2021 at 17:28
\$\begingroup\$ Had to manually deconstruct this with some test code to understand how everything fit together. Learned: 1) how to use if/else within a list comprehension to determine the values of tuple[0] within a list of tuples 2) .join on a dict joins the keys 3) walrus operator \$\endgroup\$

horseshrink
– horseshrink

2021年01月23日 20:23:51 +00:00
Commented Jan 23, 2021 at 20:23

Add a comment |

Stack Exchange Network

Find and replace strings in the first column of data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find and replace strings in the first column of data

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions