13
\$\begingroup\$

This is a Hackerrank problem: https://www.hackerrank.com/challenges/crush/problem

You are given a list of size \$N\$, initialized with zeroes. You have to perform \$M\$ operations on the list and output the maximum of final values of all the \$N\$ elements in the list. For every operation, you are given three integers \$a, b\$ and \$k\$ and you have to add value to all the elements ranging from index \$a\$ to \$b\$ (both inclusive).

Input Format

First line will contain two integers \$N\$ and \$M\$ separated by a single space. Next \$M\$ lines will contain three integers \$a, b\$ and \$k\$ separated by a single space. Numbers in list are numbered from \1ドル\$ to \$N\$.

Constraints

\3ドル \leq N \leq 10^7\$

\1ドル\leq M \leq 2*10^5\$

\1ドル \leq a \leq b \leq N\$

\$ 0 \leq k \leq 10^9\$

Output Format

A single line containing maximum value in the updated list.

Sample Input

5 3
1 2 100
2 5 100
3 4 100

Sample Output

200

My code:

def arrayManipulation(n, queries):
 nums = [0] * (n + 1)
 for q in queries:
 nums[q[0]-1] += q[2]
 nums[q[1]] -= q[2]
 current = 0
 max = 0
 for i in nums:
 current += i
 if current > max: max = current
 return max

Is there any way to optimize this?

hjpotter92
8,9011 gold badge26 silver badges49 bronze badges
asked Oct 21, 2020 at 23:53
\$\endgroup\$
4
  • 2
    \$\begingroup\$ I don't think your program would pass the test cases. You are not modifying nums for the range q[0] to q[1]. \$\endgroup\$ Commented Oct 22, 2020 at 1:09
  • 1
    \$\begingroup\$ One efficient solution looks something like order the sub intervals defined by the a, b’s. Then start at 0 and go to N and each point you can work out which intervals you have lost the contribution from and which you have gained. \$\endgroup\$ Commented Oct 22, 2020 at 2:03
  • 3
    \$\begingroup\$ @hjpotter92 He increments a by k and decrements b by k, then sums all the numbers and calculates the max on the way. The code is actually fine. \$\endgroup\$ Commented Oct 22, 2020 at 2:14
  • 1
    \$\begingroup\$ @Marc yes, you're right. Although, this approach should be explained in the question. \$\endgroup\$ Commented Oct 22, 2020 at 3:50

5 Answers 5

6
\$\begingroup\$

Nice implementation, it's already very efficient. Few suggestions:

  • Expand the variables in the for-loop from for q in queries to for a, b, k in queries. Given the problem description it's easier to read.
  • A better name for the variable current can be running_sum.
  • Avoid calling a variable max, since it's a built-in function in Python. An alternative name can be result.
  • If you change the name of the variable max then you can have result = max(result,running_sum).
  • As @hjpotter92 said, is better to add a description of your approach in the question, you will likely get more reviews. Few bullet points or some comments in the code is better than nothing.

Applying the suggestions:

def arrayManipulation(n, queries):
 nums = [0] * (n + 1)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 running_sum = 0
 result = 0
 for i in nums:
 running_sum += i
 result = max(result, running_sum)
 return result

It's already an efficient solution that runs in \$O(n+m)\$, so I wouldn't worry about performances. However, there is an alternative solution running in \$O(m*log(m))\$ in the Editorial of HackerRank.

I implemented it in Python:

def arrayManipulation(n, queries):
 indices = []
 for a, b, k in queries:
 indices.append((a, k))
 indices.append((b + 1, -k))
 indices.sort()
 running_sum = 0
 result = 0
 for _, k in indices:
 running_sum += k
 result = max(result, running_sum)
 return result

It's based on the fact that it's enough finding the running sum on the sorted indices.

FYI in the Editorial (or Discussion) section of HackerRank there are optimal solutions and detailed explanations.

Thanks to @superbrain for the corrections provided in the comments.

answered Oct 22, 2020 at 6:48
\$\endgroup\$
6
  • 1
    \$\begingroup\$ The original is O(n+m). And apparently n is not large enough compared to m in order for O(m log m) to be better :-( \$\endgroup\$ Commented Oct 22, 2020 at 13:10
  • 1
    \$\begingroup\$ @superbrain I agree about O(n+m), in this case is better to keep the term m even if smaller than n. But the results I am getting running your benchmark show that the sorting approach is still a bit faster, only with_accumulate is slightly better than "my" approach. Maybe a different environment? \$\endgroup\$ Commented Oct 22, 2020 at 14:44
  • 1
    \$\begingroup\$ Yeah, what environment did you use? I added times with 32-bit Python to my answer now and that looks similar to yours. \$\endgroup\$ Commented Oct 22, 2020 at 15:17
  • \$\begingroup\$ Windows Home 10 64-bit, Python 3.8.6. \$\endgroup\$ Commented Oct 22, 2020 at 16:12
  • \$\begingroup\$ I suspect the most influential factor is the one you left out :-). The Python bit-version. I guess you used 32-bit? Until 3.8.6 I mainly used 32-bit Python as well, as that was the default offered for download. Now for 3.9.0, the default download appears to be 64-bit, so that's what I'm mainly using now. \$\endgroup\$ Commented Oct 22, 2020 at 16:24
5
\$\begingroup\$

List vs Python array vs NumPy array

To my surprise, my solution using Reinderien's suggestion to use a Python array was fastest in my benchmark in 64-bit Python (and not bad in 32-bit Python). Here I look into that.

Why was I surprised? Because I had always considered array to be rather pointless, like a "NumPy without operations". Sure, it provides compact storage of data, but I have plenty of memory, so I'm not very interested in that. More interested in speed. And whenever you do something with the array's elements, there's overhead from always converting between a Python int object (or whatever type you use in the array) and the array's fixed-size element data. Contrast that with NumPy, where you do operations like arr += 1 or arr1 += arr2 and NumPy speedily operates on all the array elements. And if you treat NumPy arrays like lists and work on them element-wise yourself, it's sloooow. I thought Python arrays are similarly slower at that, and the are, but a lot less so:

 | a[0] a[0] += 1
--------------------------+---------------------
a = [0] | 27 ns 67 ns
a = array('q', [0]) | 35 ns 124 ns
a = np.zeros(1, np.int64) | 132 ns 504 ns

Accessing a list element or incrementing it is by far the fastest with a list, and by faaar the slowest with a NumPy array.

Let's add a (bad) NumPy version to the mix, where I badly use a NumPy array instead of a list or a Python array:

def bad_numpy(n, queries):
 nums = np.zeros(n + 1, np.int64)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 return max(accumulate(nums))

Times with my worst case benchmark:

python_list 565 ms 576 ms 577 ms
python_array 503 ms 514 ms 517 ms
numpy_array 2094 ms 2124 ms 2171 ms

So the bad NumPy usage is far slower, as expected.

The solution has three steps: Initialization of the list/array, the loop processing the queries, and accumulating/maxing. Let's measure them separately to see where each version spends how much time.

Initialization

I took out everything after the nums = ... line and measured again:

python_list 52 ms 52 ms 55 ms
python_array 30 ms 31 ms 32 ms
numpy_array 0 ms 0 ms 0 ms

The list is slowest and NumPy is unbelievably fast. Actually 0.016 ms, for an array of ten million int64s, which is 5000 GB/s. I think it must be cheating somehow. Anyway, we see that the array solutions get a head start in the benchmark due to faster initialization.

The list [0] * (n + 1) gets initialized like this, copying the 0 again and again and incrementing its reference count again and again:

for (i = 0; i < n; i++) {
 items[i] = elem;
 Py_INCREF(elem);
}

The Python array repeats faster, using memcpy to repeatedly double the elements (1 copy => 2 copies, 4 copies, 8 copies, 16 copies, etc)

Py_ssize_t done = oldbytes;
memcpy(np->ob_item, a->ob_item, oldbytes);
while (done < newbytes) {
 Py_ssize_t ncopy = (done <= newbytes-done) ? done : newbytes-done;
 memcpy(np->ob_item+done, np->ob_item, ncopy);
 done += ncopy;
}

After seeing this, I'm actually surprised the Python array isn't much faster than the list.

Processing the queries

Times for the loop processing the queries:

python_list 122 ms 125 ms 121 ms
python_array 96 ms 99 ms 95 ms
numpy_array 303 ms 307 ms 305 ms

What? But earlier we saw that the Python array is faster at processing elements! Well, but that was for a[0], i.e., always accessing/incrementing the same element. But with the worst-case data, it's random access, and the array solutions are apparently better with that. If I change the indexes from randint(1, n) to randint(1, 100), the picture looks different:

python_list 35 ms 43 ms 47 ms
python_array 77 ms 72 ms 72 ms
numpy_array 217 ms 225 ms 211 ms

Not quite sure yet why, as all three containers use 80 Mb of continuous memory, so that should be equally cache-friendly. So I think it's about the int objects that get created with += k and -= k and that they stay alive in the list but not in the arrays.

Anyway, with the worst case data, the Python array increases its lead, and the NumPy array falls from first to last place. Total times for initialization and query-processing:

python_list 174 ms 177 ms 176 ms
python_array 126 ms 130 ms 127 ms
numpy_array 303 ms 307 ms 305 ms

Accumulate and max

Times for max(accumulate(nums)):

python_list 391 ms 399 ms 401 ms
python_array 377 ms 384 ms 390 ms
numpy_array 1791 ms 1817 ms 1866 ms

So this part actually takes the longest, for all three versions. Of course in reality, in NumPy I'd use nums.cumsum().max(), which takes about 50 ms here.

Summary, moral of the story

Why is the Python array faster than the Python list in the benchmark?

  • Initialization: Because the array's initialization is less work.
  • Processing the queries: I think because the list keeps a lot of int objects alive and that's costly somehow.
  • Accumulate/max: I think because iterating the list involves accessing all the different int objects in random order, i.e., randomly accessing memory, which is not that cache-friendly.

What I take away from this all is that misusing NumPy arrays as lists is indeed a bad idea, but that using Python arrays is not equally bad but can in fact not only use less memory but also be faster than lists. While the conversion between objects and array entries does take extra time, other effects can more than make up for that lost time. That said, keep in mind that the array version was slower in my 32-bit Python benchmark and slower in query processing in 64-bit Python when I changed the test data to use smaller/fewer indexes. So it really depends on the problem. But using an array can be faster than using a list.

answered Oct 23, 2020 at 16:02
\$\endgroup\$
1
  • \$\begingroup\$ Very nice sleuthing ! Would be interesting to see how bytearray performs on these operations. I know it can only be used to store bytes so it does not have the same functionality as other options here. But it is still something similar and can be an option for small integers. \$\endgroup\$ Commented Oct 24, 2020 at 15:27
4
\$\begingroup\$

You could use itertools.accumulate to shorten your second part a lot and make it faster:

def arrayManipulation(n, queries):
 nums = [0] * (n + 1)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 return max(accumulate(nums))

Can be used on Marc's version as well. Benchmarks with various solutions on three worst case inputs:

CPython 3.9.0 64-bit on Windows 10 Pro 2004 64-bit:
 original 798 ms 787 ms 795 ms
 with_abk 785 ms 790 ms 807 ms
with_accumulate 581 ms 581 ms 596 ms
 Marc 736 ms 737 ms 736 ms
 optimized_1 698 ms 702 ms 698 ms
 optimized_2 696 ms 694 ms 690 ms
 optimized_3 692 ms 683 ms 684 ms
 Reinderien 516 ms 512 ms 511 ms
CPython 3.9.0 32-bit on Windows 10 Pro 2004 64-bit:
 original 1200 ms 1229 ms 1259 ms
 with_abk 1167 ms 1203 ms 1174 ms
with_accumulate 939 ms 937 ms 934 ms
 Marc 922 ms 927 ms 923 ms
 optimized_1 865 ms 868 ms 869 ms
 optimized_2 863 ms 863 ms 868 ms
 optimized_3 851 ms 847 ms 842 ms
 Reinderien 979 ms 959 ms 983 ms

Code:

from timeit import repeat
from random import randint
from itertools import accumulate
from array import array
def original(n, queries):
 nums = [0] * (n + 1)
 for q in queries:
 nums[q[0]-1] += q[2]
 nums[q[1]] -= q[2]
 current = 0
 max = 0
 for i in nums:
 current += i
 if current > max: max = current
 return max
def with_abk(n, queries):
 nums = [0] * (n + 1)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 current = 0
 max = 0
 for i in nums:
 current += i
 if current > max: max = current
 return max
def with_accumulate(n, queries):
 nums = [0] * (n + 1)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 return max(accumulate(nums))
def Marc(n, queries):
 indices = []
 for a, b, k in queries:
 indices.append((a, k))
 indices.append((b + 1, -k))
 indices.sort()
 running_sum = 0
 result = 0
 for _, k in indices:
 running_sum += k
 result = max(result, running_sum)
 return result
def optimized_1(n, queries):
 changes = []
 for a, b, k in queries:
 changes.append((a, k))
 changes.append((b + 1, -k))
 changes.sort()
 return max(accumulate(k for _, k in changes))
def optimized_2(n, queries):
 changes = []
 append = changes.append
 for a, b, k in queries:
 append((a, k))
 append((b + 1, -k))
 changes.sort()
 return max(accumulate(k for _, k in changes))
def optimized_3(n, queries):
 changes = [(a, k) for a, _, k in queries]
 changes += [(b + 1, -k) for _, b, k in queries]
 changes.sort()
 return max(accumulate(k for _, k in changes))
def Reinderien(n, queries):
 nums = array('q', [0]) * (n + 1)
 for a, b, k in queries:
 nums[a - 1] += k
 nums[b] -= k
 return max(accumulate(nums))
funcs = original, with_abk, with_accumulate, Marc, optimized_1, optimized_2, optimized_3, Reinderien
names = [func.__name__ for func in funcs]
def worst_case():
 n = 10**7
 m = 2 * 10**5
 queries = [sorted([randint(1, n), randint(1, n)]) + [randint(0, 10**9)]
 for _ in range(m)]
 return n, queries
# Check correctness
n, queries = worst_case()
expect = funcs[0](n, queries)
for func in funcs[1:]:
 print(func(n, queries) == expect, func.__name__)
# Benchmark
tss = [[] for _ in funcs]
for _ in range(3):
 n, queries = worst_case()
 for func, ts in zip(funcs, tss):
 t = min(repeat(lambda: func(n, queries), number=1))
 ts.append(t)
 print()
 for name, ts in zip(names, tss):
 print(name.rjust(max(map(len, names))),
 *(' %4d ms' % (t * 1000) for t in ts))
answered Oct 22, 2020 at 12:42
\$\endgroup\$
2
  • \$\begingroup\$ optimized_3 could be changed to use a generator expression on the right-hand side of += to save some memory and potentially also some time. BTW, another approach is changes = sorted(itertools.chain(*(((a, k), (b + 1, -k)) for a, b, k in queries))). \$\endgroup\$ Commented Oct 24, 2020 at 16:11
  • \$\begingroup\$ @GZ0 Yeah I guess with a generator expression it would take less memory, but I wasn't concerned with that here. Tried now anyway, appears to be very slightly slower or equally fast. The chain one is significantly slower. Loop with changes += (a, k), (b+1, -k) seems slightly faster. \$\endgroup\$ Commented Oct 24, 2020 at 18:47
2
\$\begingroup\$

Given that your array is a simple list of uniform type, you might see some small benefit in switching to https://docs.python.org/3.8/library/array.html , which is built specifically for this kind of thing. It's a compromise that uses built-ins without needing to install Numpy.

answered Oct 22, 2020 at 14:12
\$\endgroup\$
4
  • \$\begingroup\$ What kind of benefit? \$\endgroup\$ Commented Oct 22, 2020 at 15:11
  • \$\begingroup\$ @superbrain At the very least, the one that the array module itself claims - compactness. \$\endgroup\$ Commented Oct 22, 2020 at 15:13
  • 4
    \$\begingroup\$ Well that one is clear, so I thought you wouldn't have said "might" for that :-). And I thought converting between int objects and fixed-size array elements all the time would be slower, but apparently it can even be faster. I'm pleasantly surprised and wondering how it's fast. Added that to my answer, let me know if I did it suboptimally. \$\endgroup\$ Commented Oct 22, 2020 at 15:40
  • 1
    \$\begingroup\$ Looked a bit into that now, how array is fast. \$\endgroup\$ Commented Oct 23, 2020 at 16:04
2
\$\begingroup\$

I don't know of any way to optimize this; I suspect you've cracked the way it was intended to be implemented. The following are just general recommendations.

Using black to format the code will make it closer to idiomatic style, with no manual work.

After formatting I would recommend running flake8 to find remaining non-idiomatic code. For example, function names should be written in snake_case.

In Python 3.8 onwards you can use the walrus operator to change the last conditional to if (current := current + i) > max:. Not sure if that's a good idea though; I find that syntax clunky.

answered Oct 22, 2020 at 6:45
\$\endgroup\$
2
  • \$\begingroup\$ Oops, copy/paste error. Thanks, @superbrain \$\endgroup\$ Commented Oct 22, 2020 at 18:37
  • \$\begingroup\$ Ah, that's what I thought :-) I had even googled "make it closer to idiomatic style" to see whether you reused that sentence from an earlier answer, but there were no results. \$\endgroup\$ Commented Oct 23, 2020 at 3:28

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.