Removing Dicts with Null values from Only the End of a List

Question 1

Given input such as:

foos = [
 [],
 [{'a': 1}],
 [{'a': 1}, {'a': 2}],
 [{'a': 1}, {'a': 2}, {'a': None}],
 [{'a': 1}, {'a': 2}, {'a': None}, {'a': None}],
 [{'a': 1}, {'a': None}, {'a': 2}],
 [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}],
 [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}],
]

I want a function remove_empty_end_lines that takes a list of dicts and will remove all dicts with null values that only appear at the end of the list, not in between.

for foo in foos:
 print(foo)
 print(remove_empty_end_lines(foo))
 print('\n')
[]
[]
[{'a': 1}]
[{'a': 1}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}, {'a': None}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}, {'a': None}, {'a': None}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}, {'a': None}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}]
[{'a': 1}, {'a': None}, {'a': 2}]

My final solution is:

def remove_empty_end_lines(lst):
 i = next(
 (
 i for i, dct in enumerate(reversed(lst))
 if any(v is not None for v in dct.values())
 ),
 len(lst)
 )
 return lst[: len(lst) - i]

I'm not going for code golf, rather for performance and readability.

In the input I've listed above, the dicts are always a length of one, butin reality the lengths can be any amount. The lengths of each dict within a list will always be the same.

Question 2

Some suggestions:

You are using i both inside your list comprehension and outside. This makes the code harder to follow than if the variables were differently named. For example, you could think of the outer i as the count of matching (that is, all None values) dictionaries at the end of the list.
Hungarian notation abbreviations like lst and dct don't tell me anything about what the variables actually are. In an actual application this should be possible, but more context would be needed for that.
You say "remove all dicts with null values", but you neither specify nor test what happens with a dictionary having some None values and some non-None values at the end of the list. As far as I can tell from your implementation you remove only entries with all None values, but it's not clear whether this is the intention.
I would pull out the tuple passed to next as a separate variable for clarity.

Question 3

That statement where you calculate i is too complex. It contains 2 for-loops inside. Consider taking some parts out of it as separate variables or functions.

For example, you could make a function checking if a dict has any values except None:

def has_values(mapping):
 return any(value is not None 
 for value in mapping.values())

and a function that would return you the index of the first element that would satisfy some condition:

def first_valid_index(predicate, iterable, default):
 return next((index for index, value in enumerate(iterable) 
 if predicate(value)), 
 default)

These functions are pretty general and you can reuse them later in your code. Your original function then would look like this:

def remove_empty_end_lines(mappings):
 reversed_mappings = reversed(mappings)
 index = first_valid_index(has_values, 
 reversed_mappings,
 default=len(mappings))
 return mappings[:len(mappings) - index]

Also, the fact that you use lists of dicts with the same keys makes me think that you probably should use more appropriate data structures.

For example, consider this example pandas dataframe:

>>> import pandas as pd 
>>> df = pd.DataFrame(dict(a=[None, 1, 2, None, 3, 4, None, None, None],
 b=[None, 1, 2, None, None, 4, 5, None, None]))
>>> df
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0
7 NaN NaN
8 NaN NaN

To remove the last rows that contain only NaN, you would simply write:

>>> index = df.last_valid_index()
>>> df.loc[:index]
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0

Much simpler, and ideally it should be faster for a big data.

l0b0 l0b0 9,11722 silver badges36 bronze badges · Answer 1 · 2019-02-22 22:18:20Z

Some suggestions:

You are using i both inside your list comprehension and outside. This makes the code harder to follow than if the variables were differently named. For example, you could think of the outer i as the count of matching (that is, all None values) dictionaries at the end of the list.
Hungarian notation abbreviations like lst and dct don't tell me anything about what the variables actually are. In an actual application this should be possible, but more context would be needed for that.
You say "remove all dicts with null values", but you neither specify nor test what happens with a dictionary having some None values and some non-None values at the end of the list. As far as I can tell from your implementation you remove only entries with all None values, but it's not clear whether this is the intention.
I would pull out the tuple passed to next as a separate variable for clarity.

Georgy Georgy 1,9972 gold badges15 silver badges27 bronze badges · Answer 2 · 2019-02-24 10:00:07Z

That statement where you calculate i is too complex. It contains 2 for-loops inside. Consider taking some parts out of it as separate variables or functions.

For example, you could make a function checking if a dict has any values except None:

def has_values(mapping):
 return any(value is not None 
 for value in mapping.values())

and a function that would return you the index of the first element that would satisfy some condition:

def first_valid_index(predicate, iterable, default):
 return next((index for index, value in enumerate(iterable) 
 if predicate(value)), 
 default)

These functions are pretty general and you can reuse them later in your code. Your original function then would look like this:

def remove_empty_end_lines(mappings):
 reversed_mappings = reversed(mappings)
 index = first_valid_index(has_values, 
 reversed_mappings,
 default=len(mappings))
 return mappings[:len(mappings) - index]

Also, the fact that you use lists of dicts with the same keys makes me think that you probably should use more appropriate data structures.

For example, consider this example pandas dataframe:

>>> import pandas as pd 
>>> df = pd.DataFrame(dict(a=[None, 1, 2, None, 3, 4, None, None, None],
 b=[None, 1, 2, None, None, 4, 5, None, None]))
>>> df
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0
7 NaN NaN
8 NaN NaN

To remove the last rows that contain only NaN, you would simply write:

>>> index = df.last_valid_index()
>>> df.loc[:index]
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0

Much simpler, and ideally it should be faster for a big data.

Stack Exchange Network

Removing Dicts with Null values from Only the End of a List

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Removing Dicts with Null values from Only the End of a List

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions