Suppose I have a list that holds a list of column names:
columns = [
'Village Trustee V Belleville', 'Unnamed: 74', 'Unnamed: 75',
'Village President V Black Earth', 'Village Trustee V Black Earth',
'Unnamed: 78', 'Unnamed: 79', 'Village President V Blue Mounds',
'Village Trustee V Blue Mounds', 'Unnamed: 82',
'Village Trustee V Cottage Grove', 'Unnamed: 94', 'Unnamed: 95',
'Village President V Cross Plains', 'Village Trustee V Cross Plains',
'Unnamed: 98', 'Unnamed: 99'
]
And I want to find all Unnamed: XX
values and replace them consecutively with the first valid string. For example:
Input:
'Village Trustee V Belleville', 'Unnamed: 74', 'Unnamed: 75'
Output:
'Village Trustee V Belleville', 'Village Trustee V Belleville', 'Village Trustee V Belleville'
For this I have a function:
def rename_unnamed(columns):
current_value = None
pattern = re.compile(r'^Unnamed:\s\d+$')
for column_index, column in enumerate(columns):
if re.match(pattern, column):
columns[column_index] = current_value
else:
current_value = column
return columns
Which does the job pretty fine, however, I was wondering if this could be improved and/or simplified, as this looks like a mouthful for rather a simple task.
2 Answers 2
Don't mutate input and return.
If I need to pass
columns
unedited to two different functions, than your function will seem like it's not mutatingcolumns
and I wouldn't have to perform a copy of the data. But this isn't the case.You may want to potentially error if there are no values before the first "Unnamed" value.
- You may want to make
pattern
an argument, to allow reusability of the function. And instead use change it to a callback to allow even more reusability.
Overall your function's pretty good. I don't think it can be made much shorter.
def value_or_previous(iterator, undefined, default=None):
prev_item = default
for item in iterator:
if not undefined(item):
prev_item = item
yield prev_item
def rename_unnamed(columns):
return list(value_or_previous(columns, re.compile(r'^Unnamed:\s\d+$').match))
In case you need to modify the initial columns
list (mutable data structure) you don't need to return
it.
The crucial pattern can be moved out to the top making it reusable and accessible for other potential routines or inline conditions:
UNNAMED_PAT = re.compile(r'^Unnamed:\s\d+$')
As the function's responsibility is to back fill (backward filling) items matched by specific regex pattern I'd rename it appropriately like say backfill_by_pattern
. The search pattern is passed as an argument.
I'll suggest even more concise solution based on replacement of search items by the item at the previous index:
def backfill_by_pattern(columns, pat):
fill_value = None
for i, column in enumerate(columns):
if pat.match(column):
columns[i] = columns[i - 1] if i else fill_value
Sample usage:
backfill_by_pattern(columns, pat=UNNAMED_PAT)
print(columns)
The output:
['Village Trustee V Belleville',
'Village Trustee V Belleville',
'Village Trustee V Belleville',
'Village President V Black Earth',
'Village Trustee V Black Earth',
'Village Trustee V Black Earth',
'Village Trustee V Black Earth',
'Village President V Blue Mounds',
'Village Trustee V Blue Mounds',
'Village Trustee V Blue Mounds',
'Village Trustee V Cottage Grove',
'Village Trustee V Cottage Grove',
'Village Trustee V Cottage Grove',
'Village President V Cross Plains',
'Village Trustee V Cross Plains',
'Village Trustee V Cross Plains',
'Village Trustee V Cross Plains']
-
\$\begingroup\$ I get
None
when running this script; am I running it correctly? \$\endgroup\$alexyorke– alexyorke2019年12月05日 00:59:51 +00:00Commented Dec 5, 2019 at 0:59 -
1\$\begingroup\$ @alexyorke, the very 1st sentence about (mutable data structure) should have given you a hint \$\endgroup\$RomanPerekhrest– RomanPerekhrest2019年12月05日 07:35:55 +00:00Commented Dec 5, 2019 at 7:35