Given input such as:
foos = [ [], [{'a': 1}], [{'a': 1}, {'a': 2}], [{'a': 1}, {'a': 2}, {'a': None}], [{'a': 1}, {'a': 2}, {'a': None}, {'a': None}], [{'a': 1}, {'a': None}, {'a': 2}], [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}], [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}], ]
I want a function remove_empty_end_lines
that takes a list of dicts and will remove all dicts with null values that only appear at the end of the list, not in between.
for foo in foos: print(foo) print(remove_empty_end_lines(foo)) print('\n') [] [] [{'a': 1}] [{'a': 1}] [{'a': 1}, {'a': 2}] [{'a': 1}, {'a': 2}] [{'a': 1}, {'a': 2}, {'a': None}] [{'a': 1}, {'a': 2}] [{'a': 1}, {'a': 2}, {'a': None}, {'a': None}] [{'a': 1}, {'a': 2}] [{'a': 1}, {'a': None}, {'a': 2}] [{'a': 1}, {'a': None}, {'a': 2}] [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}] [{'a': 1}, {'a': None}, {'a': 2}] [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}] [{'a': 1}, {'a': None}, {'a': 2}]
My final solution is:
def remove_empty_end_lines(lst):
i = next(
(
i for i, dct in enumerate(reversed(lst))
if any(v is not None for v in dct.values())
),
len(lst)
)
return lst[: len(lst) - i]
I'm not going for code golf, rather for performance and readability.
In the input I've listed above, the dicts are always a length of one, butin reality the lengths can be any amount. The lengths of each dict within a list will always be the same.
2 Answers 2
Some suggestions:
- You are using
i
both inside your list comprehension and outside. This makes the code harder to follow than if the variables were differently named. For example, you could think of the outeri
as the count of matching (that is, allNone
values) dictionaries at the end of the list. - Hungarian notation abbreviations like
lst
anddct
don't tell me anything about what the variables actually are. In an actual application this should be possible, but more context would be needed for that. - You say "remove all dicts with null values", but you neither specify nor test what happens with a dictionary having some
None
values and some non-None
values at the end of the list. As far as I can tell from your implementation you remove only entries with allNone
values, but it's not clear whether this is the intention. - I would pull out the tuple passed to
next
as a separate variable for clarity.
That statement where you calculate i
is too complex. It contains 2 for-loops inside. Consider taking some parts out of it as separate variables or functions.
For example, you could make a function checking if a dict has any values except None
:
def has_values(mapping):
return any(value is not None
for value in mapping.values())
and a function that would return you the index of the first element that would satisfy some condition:
def first_valid_index(predicate, iterable, default):
return next((index for index, value in enumerate(iterable)
if predicate(value)),
default)
These functions are pretty general and you can reuse them later in your code. Your original function then would look like this:
def remove_empty_end_lines(mappings):
reversed_mappings = reversed(mappings)
index = first_valid_index(has_values,
reversed_mappings,
default=len(mappings))
return mappings[:len(mappings) - index]
Also, the fact that you use lists of dicts with the same keys makes me think that you probably should use more appropriate data structures.
For example, consider this example pandas dataframe:
>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[None, 1, 2, None, 3, 4, None, None, None],
b=[None, 1, 2, None, None, 4, 5, None, None]))
>>> df
a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0
7 NaN NaN
8 NaN NaN
To remove the last rows that contain only NaN
, you would simply write:
>>> index = df.last_valid_index()
>>> df.loc[:index]
a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0
Much simpler, and ideally it should be faster for a big data.