This is a Hackerrank problem: https://www.hackerrank.com/challenges/crush/problem
You are given a list of size \$N\$, initialized with zeroes. You have to perform \$M\$ operations on the list and output the maximum of final values of all the \$N\$ elements in the list. For every operation, you are given three integers \$a, b\$ and \$k\$ and you have to add value to all the elements ranging from index \$a\$ to \$b\$ (both inclusive).
Input Format
First line will contain two integers \$N\$ and \$M\$ separated by a single space. Next \$M\$ lines will contain three integers \$a, b\$ and \$k\$ separated by a single space. Numbers in list are numbered from \1ドル\$ to \$N\$.
Constraints
\3ドル \leq N \leq 10^7\$
\1ドル\leq M \leq 2*10^5\$
\1ドル \leq a \leq b \leq N\$
\$ 0 \leq k \leq 10^9\$
Output Format
A single line containing maximum value in the updated list.
Sample Input
5 3 1 2 100 2 5 100 3 4 100
Sample Output
200
My code:
def arrayManipulation(n, queries):
nums = [0] * (n + 1)
for q in queries:
nums[q[0]-1] += q[2]
nums[q[1]] -= q[2]
current = 0
max = 0
for i in nums:
current += i
if current > max: max = current
return max
Is there any way to optimize this?
5 Answers 5
Nice implementation, it's already very efficient. Few suggestions:
- Expand the variables in the for-loop from
for q in queries
tofor a, b, k in queries
. Given the problem description it's easier to read. - A better name for the variable
current
can berunning_sum
. - Avoid calling a variable
max
, since it's a built-in function in Python. An alternative name can beresult
. - If you change the name of the variable
max
then you can haveresult = max(result,running_sum)
. - As @hjpotter92 said, is better to add a description of your approach in the question, you will likely get more reviews. Few bullet points or some comments in the code is better than nothing.
Applying the suggestions:
def arrayManipulation(n, queries):
nums = [0] * (n + 1)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
running_sum = 0
result = 0
for i in nums:
running_sum += i
result = max(result, running_sum)
return result
It's already an efficient solution that runs in \$O(n+m)\$, so I wouldn't worry about performances. However, there is an alternative solution running in \$O(m*log(m))\$ in the Editorial of HackerRank.
I implemented it in Python:
def arrayManipulation(n, queries):
indices = []
for a, b, k in queries:
indices.append((a, k))
indices.append((b + 1, -k))
indices.sort()
running_sum = 0
result = 0
for _, k in indices:
running_sum += k
result = max(result, running_sum)
return result
It's based on the fact that it's enough finding the running sum on the sorted indices.
FYI in the Editorial (or Discussion) section of HackerRank there are optimal solutions and detailed explanations.
Thanks to @superbrain for the corrections provided in the comments.
-
1\$\begingroup\$ The original is O(n+m). And apparently n is not large enough compared to m in order for O(m log m) to be better :-( \$\endgroup\$superb rain– superb rain2020年10月22日 13:10:12 +00:00Commented Oct 22, 2020 at 13:10
-
1\$\begingroup\$ @superbrain I agree about
O(n+m)
, in this case is better to keep the termm
even if smaller thann
. But the results I am getting running your benchmark show that the sorting approach is still a bit faster, onlywith_accumulate
is slightly better than "my" approach. Maybe a different environment? \$\endgroup\$Marc– Marc2020年10月22日 14:44:30 +00:00Commented Oct 22, 2020 at 14:44 -
1\$\begingroup\$ Yeah, what environment did you use? I added times with 32-bit Python to my answer now and that looks similar to yours. \$\endgroup\$superb rain– superb rain2020年10月22日 15:17:42 +00:00Commented Oct 22, 2020 at 15:17
-
\$\begingroup\$ Windows Home 10 64-bit, Python 3.8.6. \$\endgroup\$Marc– Marc2020年10月22日 16:12:00 +00:00Commented Oct 22, 2020 at 16:12
-
\$\begingroup\$ I suspect the most influential factor is the one you left out :-). The Python bit-version. I guess you used 32-bit? Until 3.8.6 I mainly used 32-bit Python as well, as that was the default offered for download. Now for 3.9.0, the default download appears to be 64-bit, so that's what I'm mainly using now. \$\endgroup\$superb rain– superb rain2020年10月22日 16:24:42 +00:00Commented Oct 22, 2020 at 16:24
List vs Python array vs NumPy array
To my surprise, my solution using Reinderien's suggestion to use a Python array
was fastest in my benchmark in 64-bit Python (and not bad in 32-bit Python). Here I look into that.
Why was I surprised? Because I had always considered array
to be rather pointless, like a "NumPy without operations". Sure, it provides compact storage of data, but I have plenty of memory, so I'm not very interested in that. More interested in speed. And whenever you do something with the array's elements, there's overhead from always converting between a Python int
object (or whatever type you use in the array) and the array's fixed-size element data. Contrast that with NumPy, where you do operations like arr += 1
or arr1
+= arr2
and NumPy speedily operates on all the array elements. And if you treat NumPy arrays like lists and work on them element-wise yourself, it's sloooow. I thought Python arrays are similarly slower at that, and the are, but a lot less so:
| a[0] a[0] += 1
--------------------------+---------------------
a = [0] | 27 ns 67 ns
a = array('q', [0]) | 35 ns 124 ns
a = np.zeros(1, np.int64) | 132 ns 504 ns
Accessing a list element or incrementing it is by far the fastest with a list, and by faaar the slowest with a NumPy array.
Let's add a (bad) NumPy version to the mix, where I badly use a NumPy array instead of a list or a Python array:
def bad_numpy(n, queries):
nums = np.zeros(n + 1, np.int64)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
return max(accumulate(nums))
Times with my worst case benchmark:
python_list 565 ms 576 ms 577 ms
python_array 503 ms 514 ms 517 ms
numpy_array 2094 ms 2124 ms 2171 ms
So the bad NumPy usage is far slower, as expected.
The solution has three steps: Initialization of the list/array, the loop processing the queries, and accumulating/maxing. Let's measure them separately to see where each version spends how much time.
Initialization
I took out everything after the nums = ...
line and measured again:
python_list 52 ms 52 ms 55 ms
python_array 30 ms 31 ms 32 ms
numpy_array 0 ms 0 ms 0 ms
The list is slowest and NumPy is unbelievably fast. Actually 0.016 ms, for an array of ten million int64s, which is 5000 GB/s. I think it must be cheating somehow. Anyway, we see that the array solutions get a head start in the benchmark due to faster initialization.
The list [0] * (n + 1)
gets initialized like this, copying the 0
again and again and incrementing its reference count again and again:
for (i = 0; i < n; i++) {
items[i] = elem;
Py_INCREF(elem);
}
The Python array repeats faster, using memcpy
to repeatedly double the elements (1 copy => 2 copies, 4 copies, 8 copies, 16 copies, etc)
Py_ssize_t done = oldbytes;
memcpy(np->ob_item, a->ob_item, oldbytes);
while (done < newbytes) {
Py_ssize_t ncopy = (done <= newbytes-done) ? done : newbytes-done;
memcpy(np->ob_item+done, np->ob_item, ncopy);
done += ncopy;
}
After seeing this, I'm actually surprised the Python array isn't much faster than the list.
Processing the queries
Times for the loop processing the queries:
python_list 122 ms 125 ms 121 ms
python_array 96 ms 99 ms 95 ms
numpy_array 303 ms 307 ms 305 ms
What? But earlier we saw that the Python array is faster at processing elements! Well, but that was for a[0]
, i.e., always accessing/incrementing the same element. But with the worst-case data, it's random access, and the array solutions are apparently better with that. If I change the indexes from randint(1, n)
to randint(1, 100)
, the picture looks different:
python_list 35 ms 43 ms 47 ms
python_array 77 ms 72 ms 72 ms
numpy_array 217 ms 225 ms 211 ms
Not quite sure yet why, as all three containers use 80 Mb of continuous memory, so that should be equally cache-friendly. So I think it's about the int
objects that get created with += k
and -= k
and that they stay alive in the list
but not in the arrays.
Anyway, with the worst case data, the Python array increases its lead, and the NumPy array falls from first to last place. Total times for initialization and query-processing:
python_list 174 ms 177 ms 176 ms
python_array 126 ms 130 ms 127 ms
numpy_array 303 ms 307 ms 305 ms
Accumulate and max
Times for max(accumulate(nums))
:
python_list 391 ms 399 ms 401 ms
python_array 377 ms 384 ms 390 ms
numpy_array 1791 ms 1817 ms 1866 ms
So this part actually takes the longest, for all three versions. Of course in reality, in NumPy I'd use nums.cumsum().max()
, which takes about 50 ms here.
Summary, moral of the story
Why is the Python array faster than the Python list in the benchmark?
- Initialization: Because the array's initialization is less work.
- Processing the queries: I think because the list keeps a lot of
int
objects alive and that's costly somehow. - Accumulate/max: I think because iterating the list involves accessing all the different
int
objects in random order, i.e., randomly accessing memory, which is not that cache-friendly.
What I take away from this all is that misusing NumPy arrays as lists is indeed a bad idea, but that using Python arrays is not equally bad but can in fact not only use less memory but also be faster than lists. While the conversion between objects and array entries does take extra time, other effects can more than make up for that lost time. That said, keep in mind that the array version was slower in my 32-bit Python benchmark and slower in query processing in 64-bit Python when I changed the test data to use smaller/fewer indexes. So it really depends on the problem. But using an array can be faster than using a list.
-
\$\begingroup\$ Very nice sleuthing ! Would be interesting to see how
bytearray
performs on these operations. I know it can only be used to store bytes so it does not have the same functionality as other options here. But it is still something similar and can be an option for small integers. \$\endgroup\$GZ0– GZ02020年10月24日 15:27:31 +00:00Commented Oct 24, 2020 at 15:27
You could use itertools.accumulate
to shorten your second part a lot and make it faster:
def arrayManipulation(n, queries):
nums = [0] * (n + 1)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
return max(accumulate(nums))
Can be used on Marc's version as well. Benchmarks with various solutions on three worst case inputs:
CPython 3.9.0 64-bit on Windows 10 Pro 2004 64-bit:
original 798 ms 787 ms 795 ms
with_abk 785 ms 790 ms 807 ms
with_accumulate 581 ms 581 ms 596 ms
Marc 736 ms 737 ms 736 ms
optimized_1 698 ms 702 ms 698 ms
optimized_2 696 ms 694 ms 690 ms
optimized_3 692 ms 683 ms 684 ms
Reinderien 516 ms 512 ms 511 ms
CPython 3.9.0 32-bit on Windows 10 Pro 2004 64-bit:
original 1200 ms 1229 ms 1259 ms
with_abk 1167 ms 1203 ms 1174 ms
with_accumulate 939 ms 937 ms 934 ms
Marc 922 ms 927 ms 923 ms
optimized_1 865 ms 868 ms 869 ms
optimized_2 863 ms 863 ms 868 ms
optimized_3 851 ms 847 ms 842 ms
Reinderien 979 ms 959 ms 983 ms
Code:
from timeit import repeat
from random import randint
from itertools import accumulate
from array import array
def original(n, queries):
nums = [0] * (n + 1)
for q in queries:
nums[q[0]-1] += q[2]
nums[q[1]] -= q[2]
current = 0
max = 0
for i in nums:
current += i
if current > max: max = current
return max
def with_abk(n, queries):
nums = [0] * (n + 1)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
current = 0
max = 0
for i in nums:
current += i
if current > max: max = current
return max
def with_accumulate(n, queries):
nums = [0] * (n + 1)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
return max(accumulate(nums))
def Marc(n, queries):
indices = []
for a, b, k in queries:
indices.append((a, k))
indices.append((b + 1, -k))
indices.sort()
running_sum = 0
result = 0
for _, k in indices:
running_sum += k
result = max(result, running_sum)
return result
def optimized_1(n, queries):
changes = []
for a, b, k in queries:
changes.append((a, k))
changes.append((b + 1, -k))
changes.sort()
return max(accumulate(k for _, k in changes))
def optimized_2(n, queries):
changes = []
append = changes.append
for a, b, k in queries:
append((a, k))
append((b + 1, -k))
changes.sort()
return max(accumulate(k for _, k in changes))
def optimized_3(n, queries):
changes = [(a, k) for a, _, k in queries]
changes += [(b + 1, -k) for _, b, k in queries]
changes.sort()
return max(accumulate(k for _, k in changes))
def Reinderien(n, queries):
nums = array('q', [0]) * (n + 1)
for a, b, k in queries:
nums[a - 1] += k
nums[b] -= k
return max(accumulate(nums))
funcs = original, with_abk, with_accumulate, Marc, optimized_1, optimized_2, optimized_3, Reinderien
names = [func.__name__ for func in funcs]
def worst_case():
n = 10**7
m = 2 * 10**5
queries = [sorted([randint(1, n), randint(1, n)]) + [randint(0, 10**9)]
for _ in range(m)]
return n, queries
# Check correctness
n, queries = worst_case()
expect = funcs[0](n, queries)
for func in funcs[1:]:
print(func(n, queries) == expect, func.__name__)
# Benchmark
tss = [[] for _ in funcs]
for _ in range(3):
n, queries = worst_case()
for func, ts in zip(funcs, tss):
t = min(repeat(lambda: func(n, queries), number=1))
ts.append(t)
print()
for name, ts in zip(names, tss):
print(name.rjust(max(map(len, names))),
*(' %4d ms' % (t * 1000) for t in ts))
-
\$\begingroup\$
optimized_3
could be changed to use a generator expression on the right-hand side of+=
to save some memory and potentially also some time. BTW, another approach ischanges = sorted(itertools.chain(*(((a, k), (b + 1, -k)) for a, b, k in queries)))
. \$\endgroup\$GZ0– GZ02020年10月24日 16:11:06 +00:00Commented Oct 24, 2020 at 16:11 -
\$\begingroup\$ @GZ0 Yeah I guess with a generator expression it would take less memory, but I wasn't concerned with that here. Tried now anyway, appears to be very slightly slower or equally fast. The chain one is significantly slower. Loop with
changes += (a, k), (b+1, -k)
seems slightly faster. \$\endgroup\$superb rain– superb rain2020年10月24日 18:47:31 +00:00Commented Oct 24, 2020 at 18:47
Given that your array is a simple list of uniform type, you might see some small benefit in switching to https://docs.python.org/3.8/library/array.html , which is built specifically for this kind of thing. It's a compromise that uses built-ins without needing to install Numpy.
-
\$\begingroup\$ What kind of benefit? \$\endgroup\$superb rain– superb rain2020年10月22日 15:11:11 +00:00Commented Oct 22, 2020 at 15:11
-
\$\begingroup\$ @superbrain At the very least, the one that the
array
module itself claims - compactness. \$\endgroup\$Reinderien– Reinderien2020年10月22日 15:13:27 +00:00Commented Oct 22, 2020 at 15:13 -
4\$\begingroup\$ Well that one is clear, so I thought you wouldn't have said "might" for that :-). And I thought converting between
int
objects and fixed-sizearray
elements all the time would be slower, but apparently it can even be faster. I'm pleasantly surprised and wondering how it's fast. Added that to my answer, let me know if I did it suboptimally. \$\endgroup\$superb rain– superb rain2020年10月22日 15:40:24 +00:00Commented Oct 22, 2020 at 15:40 -
1\$\begingroup\$ Looked a bit into that now, how array is fast. \$\endgroup\$superb rain– superb rain2020年10月23日 16:04:11 +00:00Commented Oct 23, 2020 at 16:04
I don't know of any way to optimize this; I suspect you've cracked the way it was intended to be implemented. The following are just general recommendations.
Using black
to format the code will make it closer to idiomatic style, with no manual work.
After formatting I would recommend running flake8
to find remaining non-idiomatic code. For example, function names should be written in snake_case
.
In Python 3.8 onwards you can use the walrus operator to change the last conditional to if (current := current + i) > max:
. Not sure if that's a good idea though; I find that syntax clunky.
-
\$\begingroup\$ Oops, copy/paste error. Thanks, @superbrain \$\endgroup\$l0b0– l0b02020年10月22日 18:37:02 +00:00Commented Oct 22, 2020 at 18:37
-
\$\begingroup\$ Ah, that's what I thought :-) I had even googled "make it closer to idiomatic style" to see whether you reused that sentence from an earlier answer, but there were no results. \$\endgroup\$superb rain– superb rain2020年10月23日 03:28:35 +00:00Commented Oct 23, 2020 at 3:28
Explore related questions
See similar questions with these tags.
nums
for the rangeq[0]
toq[1]
. \$\endgroup\$a
byk
and decrementsb
byk
, then sums all the numbers and calculates the max on the way. The code is actually fine. \$\endgroup\$