2
\$\begingroup\$

Given input such as:

foos = [
 [],
 [{'a': 1}],
 [{'a': 1}, {'a': 2}],
 [{'a': 1}, {'a': 2}, {'a': None}],
 [{'a': 1}, {'a': 2}, {'a': None}, {'a': None}],
 [{'a': 1}, {'a': None}, {'a': 2}],
 [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}],
 [{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}],
]

I want a function remove_empty_end_lines that takes a list of dicts and will remove all dicts with null values that only appear at the end of the list, not in between.

for foo in foos:
 print(foo)
 print(remove_empty_end_lines(foo))
 print('\n')
[]
[]
[{'a': 1}]
[{'a': 1}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}, {'a': None}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': 2}, {'a': None}, {'a': None}]
[{'a': 1}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}, {'a': None}]
[{'a': 1}, {'a': None}, {'a': 2}]
[{'a': 1}, {'a': None}, {'a': 2}, {'a': None}, {'a': None}]
[{'a': 1}, {'a': None}, {'a': 2}]

My final solution is:

def remove_empty_end_lines(lst):
 i = next(
 (
 i for i, dct in enumerate(reversed(lst))
 if any(v is not None for v in dct.values())
 ),
 len(lst)
 )
 return lst[: len(lst) - i]

I'm not going for code golf, rather for performance and readability.

In the input I've listed above, the dicts are always a length of one, butin reality the lengths can be any amount. The lengths of each dict within a list will always be the same.

asked Feb 22, 2019 at 19:35
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

Some suggestions:

  • You are using i both inside your list comprehension and outside. This makes the code harder to follow than if the variables were differently named. For example, you could think of the outer i as the count of matching (that is, all None values) dictionaries at the end of the list.
  • Hungarian notation abbreviations like lst and dct don't tell me anything about what the variables actually are. In an actual application this should be possible, but more context would be needed for that.
  • You say "remove all dicts with null values", but you neither specify nor test what happens with a dictionary having some None values and some non-None values at the end of the list. As far as I can tell from your implementation you remove only entries with all None values, but it's not clear whether this is the intention.
  • I would pull out the tuple passed to next as a separate variable for clarity.
answered Feb 22, 2019 at 22:18
\$\endgroup\$
1
\$\begingroup\$

That statement where you calculate i is too complex. It contains 2 for-loops inside. Consider taking some parts out of it as separate variables or functions.

For example, you could make a function checking if a dict has any values except None:

def has_values(mapping):
 return any(value is not None 
 for value in mapping.values())

and a function that would return you the index of the first element that would satisfy some condition:

def first_valid_index(predicate, iterable, default):
 return next((index for index, value in enumerate(iterable) 
 if predicate(value)), 
 default)

These functions are pretty general and you can reuse them later in your code. Your original function then would look like this:

def remove_empty_end_lines(mappings):
 reversed_mappings = reversed(mappings)
 index = first_valid_index(has_values, 
 reversed_mappings,
 default=len(mappings))
 return mappings[:len(mappings) - index]

Also, the fact that you use lists of dicts with the same keys makes me think that you probably should use more appropriate data structures.

For example, consider this example pandas dataframe:

>>> import pandas as pd 
>>> df = pd.DataFrame(dict(a=[None, 1, 2, None, 3, 4, None, None, None],
 b=[None, 1, 2, None, None, 4, 5, None, None]))
>>> df
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0
7 NaN NaN
8 NaN NaN

To remove the last rows that contain only NaN, you would simply write:

>>> index = df.last_valid_index()
>>> df.loc[:index]
 a b
0 NaN NaN
1 1.0 1.0
2 2.0 2.0
3 NaN NaN
4 3.0 NaN
5 4.0 4.0
6 NaN 5.0

Much simpler, and ideally it should be faster for a big data.

answered Feb 24, 2019 at 10:00
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.