Algorithm "sort except zero"

Question 1

Task:

Sort the integers in sequence in sequence. But the position of zeros should not be changed.

Input: List of integers (int).

Output: List or another Iterable (tuple, generator, iterator) of integers (int).

Examples:

[5, 3, 0, 0, 4, 1, 4, 0, 7] => [1, 3, 0, 0, 4, 4, 5, 0, 7]

[0, 2, 3, 1, 0, 4, 5] => [0, 1, 2, 3, 0, 4, 5]

[0, 0, 0, 1, 0] => [0, 0, 0, 1, 0]

[4, 5, 3, 1, 1] => [1, 1, 3, 4, 5]

Precondition:

All numbers are non-negative.

My code:

from collections.abc import Iterable
def except_zero(items: list[int]) -> Iterable[int]:
 without_z = [i for i in items if i != 0]
 place_z = [i[0] for i in enumerate([i if i == 0 else 1 for i in items]) if i[1] == 0]
 without_z = sorted(without_z)
 for i in place_z: without_z.insert(i, 0)
 return without_z

How can I make this faster?

Question 2

Most of the work will be done by your call to sorted. You just need to make sure to not make too much extra fluff around that call.

One particular call that may greatly impact performances, especially on large lists with lots of zeroes is your without_z.insert which needs to move memory around. Instead, I’d rather use this result and pull elements out from it to build a whole new list with zeroes inserted as necessary.

The name of the function does not convey anything sort-related, please change it and consider using a docstring to document its particular behavior.

Lastly, your type hints are off as you are explicitly returning a list[int] but your code can accept any Sequence[int] iterables which allow you to iterate over them twice.

from typing import Sequence
def sort_except_zeroes(items: Sequence[int]) -> list[int]:
 non_zeroes = iter(sorted(i for i in items if i != 0))
 return [next(non_zeroes) if i != 0 else 0 for i in items]

Question 3

About the name of the function: you say that this "does not convey anything sort-related", but there are several answers about this algorithm in which the name of function is "except_zero". See here: stackoverflow.com/questions/61499975/…

Question 4

@Глеб yes, and I wouldn't understand, without context, what this function is doing. Hence the better name + docstring suggestion.

Question 5

Yes, you right (though there is not nothing about clarity of function names in pep8).

Question 6

@301_Moved_Permanently FYI, the first assignment can be iter(sorted(filter(bool,items))), and the return value can be [i and next(non_zeroes) for i in items]

Question 7

@Mous I’m leaving the filter out as it is already covered in Peilonrayz' answer. As for the i and next, yes, that's another micro-optimization that's slightly faster, but also slightly harder to read and understand. As always, being a tradeoff, I’d rather stick with Python 2.5+ ternary here rather than previous and or tricks that may gives undesirable results when not used carefully (although not the case here). However, your mileage may vary and one should consider the tradeoffs before using an approach or another.

Question 8

301_Moved_Permanently's answer is great and changes the performance from \$O(nk\log(n))\$ to \$O(n\log(n))\$. The change results in a measurable impact on performance.

To satiate both my curiosity and the site's need for at least one IO I have explored two micro-optimisations. Firstly I want to note micro-optimisations are not typically a good approach only use micro-optimisations for when you need improvements in the micro level -- \10ドル^{-6}\$s. Which is pretty rare.

sorted builds a new list where list.sort mutates an existing list. list.sort is typically faster because we remove the copy operation.

def test_peil__sort(items: Iterable[int]) -> list[int]:
 items = [i for i in items if i != 0]
 items.sort()
 non_zeroes = iter(items)
 return [next(non_zeroes) if i != 0 else 0 for i in items]

Python is typically slower than C, so offloading to C is a common micro-optimisation technique. Here we can offload to filter's C loop.
Thanks @duckboycool for reminding me of filter over itertools.filterfalse.
```
def test_peil__filter(items: Iterable[int]) -> list[int]:
 non_zeroes = iter(sorted(filter(None, items)))
 return [next(non_zeroes) if i != 0 else 0 for i in items]
```

Assuming 1 & 2 are faster, combining the two together may result in faster code.

def test_peil__filter_sort(items: Iterable[int]) -> list[int]:
 items = list(filter(None, items))
 items.sort()
 non_zeroes = iter(items)
 return [next(non_zeroes) if i != 0 else 0 for i in items]

Note: To test the performance of the functions we need to build sample data. I'm using the following function to do so:

random.seed(42401)
@functools.cache
def args_conv(size: int) -> list[int]:
 arr = list(range(int(size)))
 random.shuffle(arr)
 return arr

Image comparing all algorithms.

We can see:

test_mathias is faster than the test_orig.
test_peil__sort's micro-optimisation is faster than test_mathias in the micro section, getting quite close otherwise.
test_peil__filter's micro-optimisation is marginally better than test_mathias throughout.
test_peil__filter_sort seems virtually indistinguishable from test_peil__filter.
test_cris shows NumPy is much better than Python when working with numbers.
test_cris__py shows converting between Python and NumPy can greatly diminish the benefit of using NumPy.

We can also see all Python based algorithms are tending towards the same worst case. To make comparing user's suggestions easier, here is the graph again with only one algorithm per user.

Image comparing best per user.

I said at the beginning of the answer 301_Moved_Permanently's answer "changed the performance from \$O(nk\log(n))\$ to \$O(n\log(n))\$." However we can't see the improvement in the graph. In args_conv I used a sampling method which only has one 0, so \$k = 1\$. If we change the sampling method to have \$n / 2\$ 0s we can show test_orig pulling away from test_mathias with a curve.

Image showing the original's main problem.

Full code:

final/__init__.py

from collections.abc import Iterable
import numpy
from numpy.typing import NDArray
def test_orig(items: list[int]) -> Iterable[int]:
 without_z = [i for i in items if i != 0]
 place_z = [i[0] for i in enumerate([i if i == 0 else 1 for i in items]) if i[1] == 0]
 without_z = sorted(without_z)
 for i in place_z: without_z.insert(i, 0)
 return without_z
# Derived https://codereview.stackexchange.com/a/287025
# By 301_Moved_Permanently
def test_mathias(items: Iterable[int]) -> list[int]:
 non_zeroes = iter(sorted(i for i in items if i != 0))
 return [next(non_zeroes) if i != 0 else 0 for i in items]
# Derived https://codereview.stackexchange.com/a/287029
# By Cris Luengo
def test_cris(items: NDArray[numpy.int_]) -> NDArray[numpy.int_]:
 items = items.copy() # can leave this out if you want to work in-place
 nonzeros = items != 0
 items[nonzeros] = numpy.sort(items[nonzeros])
 return items
# Derived https://codereview.stackexchange.com/a/287029
# By Cris Luengo
def test_cris__py(items: Iterable[int]) -> list[int]:
 items = numpy.array(items)
 nonzeros = items != 0
 items[nonzeros] = numpy.sort(items[nonzeros])
 return list(items)
# Derived https://codereview.stackexchange.com/a/287025
# By 301_Moved_Permanently
def test_peil__sort(items: Iterable[int]) -> list[int]:
 items = [i for i in items if i != 0]
 items.sort()
 non_zeroes = iter(items)
 return [next(non_zeroes) if i != 0 else 0 for i in items]
# Derived https://codereview.stackexchange.com/a/287025
# By 301_Moved_Permanently
def test_peil__filter(items: Iterable[int]) -> list[int]:
 non_zeroes = iter(sorted(filter(None, items)))
 return [next(non_zeroes) if i != 0 else 0 for i in items]
# Derived https://codereview.stackexchange.com/a/287025
# By 301_Moved_Permanently
def test_peil__filter_sort(items: Iterable[int]) -> list[int]:
 _items = list(filter(None, items))
 _items.sort()
 non_zeroes = iter(_items)
 return [next(non_zeroes) if i != 0 else 0 for i in items]

plot.py

import functools
import random
from typing import Any
import matplotlib.pyplot
import numpy
from numpy.typing import NDArray
import graphtimer
from . import final
random.seed(42401)
@functools.cache
def args_conv(size: int) -> list[int]:
 arr = list(range(int(size)))
 random.shuffle(arr)
 return arr
def args_conv__np(size: int) -> NDArray[numpy.int_]:
 return numpy.array(args_conv(size))
@functools.cache
def args_conv__k(size: int) -> list[int]:
 arr = list(range(int(size))) + [0] * int(size / 2)
 random.shuffle(arr)
 return arr
def args_conv__k__np(size: int) -> NDArray[numpy.int_]:
 return numpy.array(args_conv__k(size))
def join_plots(
 base: graphtimer.plotter.PlotValues,
 *values: graphtimer.plotter.PlotValues,
) -> graphtimer.plotter.PlotValues:
 for value in values:
 if (base.kwargs["domain"] != value.kwargs["domain"]).any():
 raise ValueError("Can only merge PlotValues with the same domain.")
 data: list[list[graphtimer.plotter._DataValues]] = list(base.data)
 kwargs: dict[str, Any] = {
 "functions": list(base.kwargs["functions"]),
 "domain": base.kwargs["domain"],
 }
 for value in values:
 data += value.data
 kwargs["functions"] += value.kwargs["functions"]
 return graphtimer.plotter.PlotValues(data, kwargs)
def main():
 fig, axs = matplotlib.pyplot.subplots()
 axs.set_yscale('log')
 axs.set_xscale('log')
 join_plots(
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_orig, final.test_mathias, final.test_peil__filter, final.test_cris__py, final.test_peil__sort, final.test_peil__filter_sort]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv)
 .min()
 ),
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_cris]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv__np)
 .min()
 ),
 ).plot(axs, x_label='len(nums)')
 fig.show()
 fig, axs = matplotlib.pyplot.subplots()
 axs.set_yscale('log')
 axs.set_xscale('log')
 join_plots(
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_orig, final.test_mathias, final.test_peil__filter]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv)
 .min()
 ),
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_cris]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv__np)
 .min()
 ),
 ).plot(axs, x_label='len(nums)')
 fig.show()
 fig, axs = matplotlib.pyplot.subplots()
 axs.set_yscale('log')
 axs.set_xscale('log')
 join_plots(
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_orig, final.test_mathias, final.test_peil__filter]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv__k)
 .min()
 ),
 (
 graphtimer.Plotter(graphtimer.MultiTimer([final.test_cris]))
 .repeat(10, 10, numpy.logspace(0, 4, num=50), args_conv=args_conv__k__np)
 .min()
 ),
 ).plot(axs, x_label='len(nums)')
 fig.show()
 input()
if __name__ == '__main__':
 main()

Question 9

I would suggest you use NumPy for anything that manipulates collections of numbers of the same type. A NumPy array is a more efficient container than a list in this case, an the package contain a large collection of tools to manipulate these arrays very efficiently.

For example, your code can be written as:

import numpy as np
from numpy.typing import NDArray
def except_zero(items: NDArray[np.int_]) -> NDArray[np.int_]:
 output = items.copy() # can leave this out if you want to work in-place
 nonzeros = output != 0
 output[nonzeros] = np.sort(output[nonzeros])
 return output

List comprehensions are powerful language constructs, but they tend to make code less readable. Especially if you nest them like this:

place_z = [i[0] for i in enumerate([i if i == 0 else 1 for i in items]) if i[1] == 0]

Separating such a line into two lines would improve readability:

binary = [i if i == 0 else 1 for i in items]
place_z = [i[0] for i in enumerate(binary) if i[1] == 0]

Though this doesn't really need two list comprehensions, the inner one doesn't add anything important, it just converts non-zeros to 1, but this is not required for the outer list comprehension logic, which only compares agains zero. So we can write this as:

place_z = [i[0] for i in enumerate(items) if i[1] == 0]

This last line is again better expressed as (thanks to J_H):

place_z = [i for i, v in enumerate(items) if v == 0]

Question 10

I feel these are a little clearer: binary = list(map(bool, items)), place_z = [i for i, v in enumerate(binary) if v == 0].

Question 11

@J_H You are right, that is clearer (I edited the answer, thanks!). But the binary list is not necessary, as per the last paragraph in my answer.

score 13 · Accepted Answer · 2023-09-14 14:48:22Z

13

\$\begingroup\$

Most of the work will be done by your call to sorted. You just need to make sure to not make too much extra fluff around that call.

One particular call that may greatly impact performances, especially on large lists with lots of zeroes is your without_z.insert which needs to move memory around. Instead, I’d rather use this result and pull elements out from it to build a whole new list with zeroes inserted as necessary.

The name of the function does not convey anything sort-related, please change it and consider using a docstring to document its particular behavior.

Lastly, your type hints are off as you are explicitly returning a list[int] but your code can accept any Sequence[int] iterables which allow you to iterate over them twice.

from typing import Sequence
def sort_except_zeroes(items: Sequence[int]) -> list[int]:
 non_zeroes = iter(sorted(i for i in items if i != 0))
 return [next(non_zeroes) if i != 0 else 0 for i in items]

Share

edited Sep 14, 2023 at 15:53

answered Sep 14, 2023 at 14:48

301_Moved_Permanently's user avatar

301_Moved_Permanently 301_Moved_Permanently

29.4k3 gold badges48 silver badges98 bronze badges

\$\endgroup\$

5

\$\begingroup\$ About the name of the function: you say that this "does not convey anything sort-related", but there are several answers about this algorithm in which the name of function is "except_zero". See here: stackoverflow.com/questions/61499975/… \$\endgroup\$

Глеб
– Глеб

2023年09月14日 18:21:36 +00:00
Commented Sep 14, 2023 at 18:21
\$\begingroup\$ @Глеб yes, and I wouldn't understand, without context, what this function is doing. Hence the better name + docstring suggestion. \$\endgroup\$

301_Moved_Permanently
– 301_Moved_Permanently

2023年09月14日 18:24:49 +00:00
Commented Sep 14, 2023 at 18:24
\$\begingroup\$ Yes, you right (though there is not nothing about clarity of function names in pep8). \$\endgroup\$

Глеб
– Глеб

2023年09月14日 18:29:14 +00:00
Commented Sep 14, 2023 at 18:29
3

\$\begingroup\$ @301_Moved_Permanently FYI, the first assignment can be iter(sorted(filter(bool,items))), and the return value can be [i and next(non_zeroes) for i in items] \$\endgroup\$

Mous
– Mous

2023年09月15日 05:48:56 +00:00
Commented Sep 15, 2023 at 5:48
1

\$\begingroup\$ @Mous I’m leaving the filter out as it is already covered in Peilonrayz' answer. As for the i and next, yes, that's another micro-optimization that's slightly faster, but also slightly harder to read and understand. As always, being a tradeoff, I’d rather stick with Python 2.5+ ternary here rather than previous and or tricks that may gives undesirable results when not used carefully (although not the case here). However, your mileage may vary and one should consider the tradeoffs before using an approach or another. \$\endgroup\$

301_Moved_Permanently
– 301_Moved_Permanently

2023年09月18日 09:25:41 +00:00
Commented Sep 18, 2023 at 9:25

Add a comment |

Stack Exchange Network

Algorithm "sort except zero"

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Algorithm "sort except zero"

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions