Counting adjacent swaps to sort an array with 3 different values

Question 1

This is the Kindergarten Excursion problem from kattis.com:

The kindergarten teachers had finally managed to get all the kids in a line for the walk to the bus station and the weekly excursion. What they hadn’t thought of was the fact that today different kids were supposed to go to different excursions. They should walk as one line to the bus station, but to avoid total chaos when they arrive there, all the kids going to the zoo should be in the beginning of the line, the ones going to the lake should be in the middle and the rest, going to the science museum, should be in the end.

Since it takes a lot of time to get all the kids to stand in a line no kid may step out of the line. To get the line organized after excursion group, kids standing next to each other can swap places. Now the kindergarten teachers wonder, if they will have time to do this and still catch their bus.

Task

You will be given a sequence of numbers 0, 1, and 2, denoting the excursion destinations of each kid from first to last in the line. Pairs of adjacent kids can swap positions, and the line should be organized after destination number starting with 0 and ending with 2. What is the minimum number of swaps required to organize the line?

Input

The only line of input contains a string consisting of characters 0, 1 or 2, denoting the destinations of kids. The length of the string will be at most 1,000,000 characters.

Output

Output one line with one integer – the minimum number of swaps needed to get the kids in order.

Sample Input 1
10210
Sample Output 1
5

This is my code. My solution is too slow and I need help with optimization or a different approach for this problem.

def excursion(str):
 digits = [0, 0, 0]
 x = 0
 for i in range(len(str)):
 for j in range((int(str[i])) + 1, 3):
 x += digits[j]
 digits[int(str[i])] += 1
 return x
print(excursion(input()))

Question 2

The code in the post runs in linear time, so we can't hope to improve the asymptotic complexity, but we might be able to make some piecewise optimizations. Let's start by making a random test case:

>>> import random
>>> CASE = ''.join(random.choice('012') for _ in range(10**6))

Timing the performance on this test case establishes a baseline for any improvements.

>>> from timeit import timeit
>>> timeit(lambda:excursion(CASE), number=1)
1.1623761800583452

The expression int(str[i])) appears twice, so let's remember it in a local variable:

def excursion2(str):
 digits = [0, 0, 0]
 result = 0
 for i in range(len(str)):
 digit = int(str[i])
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 25% of the runtime:

>>> timeit(lambda:excursion2(CASE), number=1)
0.867930110078305

The code only uses the index i to look up the corresponding character of str. So instead of iterating over the indexes, iterate over the characters themselves:

def excursion3(str):
 digits = [0, 0, 0]
 result = 0
 for c in str:
 digit = int(c)
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 3%:

>>> timeit(lambda:excursion3(CASE), number=1)
0.8365003759972751

We can save an assignment statement by using map to apply the function int to the characters of str:

def excursion4(str):
 digits = [0, 0, 0]
 result = 0
 for digit in map(int, str):
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 4%:

>>> timeit(lambda:excursion4(CASE), number=1)
0.7819818791467696

We could cache the range objects, to avoid rebuilding them on each loop:

def excursion5(str):
 n = 3
 digits = [0] * n
 ranges = [range(i + 1, n) for i in range(n)]
 result = 0
 for digit in map(int, str):
 for j in ranges[digit]:
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 25%:

>>> timeit(lambda:excursion5(CASE), number=1)
0.497486931970343

The cumulative impact of these changes reduced the overall runtime by about 60%. Is that good enough to pass the time limit?

Question 3

Since you know that there will only be 3 distinct values, you should use bucket sort. We will store the counts in a dictionary, and then build the string at the end to avoid the cost of repeated concatenation. This code is both simpler and faster.

def excursion(string):
 digits = {'0':0, '1':0, '2':0}
 for char in string:
 digits[char] += 1
 return '0'*digits['0'] + '1'*digits['1'] + '2'*digits['2']
n=0b10000001000000
print(len(excursion('2'*n+'1'*n+'0'*n)))

Question 4

The challenge is not to perform the sorting, but to count the number of swaps involved in a bubble sort. (If you wanted to do a counting sort, itertools.Counter would have resulted in an even simpler implementation.)

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Answer 1 · 2018-02-22 16:04:35Z

The code in the post runs in linear time, so we can't hope to improve the asymptotic complexity, but we might be able to make some piecewise optimizations. Let's start by making a random test case:

>>> import random
>>> CASE = ''.join(random.choice('012') for _ in range(10**6))

Timing the performance on this test case establishes a baseline for any improvements.

>>> from timeit import timeit
>>> timeit(lambda:excursion(CASE), number=1)
1.1623761800583452

The expression int(str[i])) appears twice, so let's remember it in a local variable:

def excursion2(str):
 digits = [0, 0, 0]
 result = 0
 for i in range(len(str)):
 digit = int(str[i])
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 25% of the runtime:

>>> timeit(lambda:excursion2(CASE), number=1)
0.867930110078305

The code only uses the index i to look up the corresponding character of str. So instead of iterating over the indexes, iterate over the characters themselves:

def excursion3(str):
 digits = [0, 0, 0]
 result = 0
 for c in str:
 digit = int(c)
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 3%:

>>> timeit(lambda:excursion3(CASE), number=1)
0.8365003759972751

We can save an assignment statement by using map to apply the function int to the characters of str:

def excursion4(str):
 digits = [0, 0, 0]
 result = 0
 for digit in map(int, str):
 for j in range(digit + 1, 3):
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 4%:

>>> timeit(lambda:excursion4(CASE), number=1)
0.7819818791467696

We could cache the range objects, to avoid rebuilding them on each loop:

def excursion5(str):
 n = 3
 digits = [0] * n
 ranges = [range(i + 1, n) for i in range(n)]
 result = 0
 for digit in map(int, str):
 for j in ranges[digit]:
 result += digits[j]
 digits[digit] += 1
 return result

This saves about 25%:

>>> timeit(lambda:excursion5(CASE), number=1)
0.497486931970343

The cumulative impact of these changes reduced the overall runtime by about 60%. Is that good enough to pass the time limit?

Oscar Smith Oscar Smith 3,71718 silver badges31 bronze badges · Answer 2 · 2018-02-05 19:42:30Z

Since you know that there will only be 3 distinct values, you should use bucket sort. We will store the counts in a dictionary, and then build the string at the end to avoid the cost of repeated concatenation. This code is both simpler and faster.

def excursion(string):
 digits = {'0':0, '1':0, '2':0}
 for char in string:
 digits[char] += 1
 return '0'*digits['0'] + '1'*digits['1'] + '2'*digits['2']
n=0b10000001000000
print(len(excursion('2'*n+'1'*n+'0'*n)))

The challenge is not to perform the sorting, but to count the number of swaps involved in a bubble sort. (If you wanted to do a counting sort, itertools.Counter would have resulted in an even simpler implementation.)

Stack Exchange Network

Counting adjacent swaps to sort an array with 3 different values

Task

Input

Output

Sample Input 1

Sample Output 1

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting adjacent swaps to sort an array with 3 different values

Task

Input

Output

Sample Input 1

Sample Output 1

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions