"Recycled numbers" challenge

Question 1

I wrote this program for the Google Code Jam problem, "Recycled numbers."

It took around 2 and half minutes to solve the small input set on my dual core, 1GB RAM system. But for the large input set it took around 25 minutes(processor usage was 100% for those 25 mins) to solve the problem.

Problem

Let's say a pair of distinct positive integers (\$n\,ドル \$m\$) is recycled if you can obtain m by moving some digits from the back of n to the front without changing their order.

For example, (12345, 34512) is a recycled pair since you can obtain 34512 by moving 345 from the end of 12345 to the front. Note that n and m must have the same number of digits in order to be a recycled pair. Neither n nor m can have leading zeros.

Given integers \$A\$ and \$B\$ with the same number of digits and no leading zeros, how many distinct recycled pairs (\$n\,ドル \$m\$) are there with \$A ≤ n < > m ≤ B\$?

How can I make my program fast enough to solve the large input set within 8 minutes?

f=open('C-large.in')
noi=int(f.readline().rstrip('\r\n'))
i=1
ou=open('jam.out','w')
while i<=noi:
 re=f.readline().rstrip('\r\n')
 rs=re.split()
 a=int(rs[0])
 b=int(rs[1])
 ans=0
 c=[x for x in range(a,b+1)]
 for ind,x in enumerate(c):
 for y in c[ind+1:]:
 n=str(x)
 m=str(y)
 if len(n)==len(m) and len(n)>1 and len(m)>1:
 ki=1 
 while ki <len(n):
 temp=n[-1:-(ki)-1:-1][::-1]+n[:-(ki)]
 if temp[0]!='0':
 if temp==m:
 ans+=1
 break
 ki+=1 
 else :
 break
 print("Case #%s:"%(i),ans,file=ou)
 i+=1
ou.close()
f.close()

Small input:

Large input:

50
10000 10001
1781 7688
1007465 1919027
100000 100000
1032122 1995680
1216257 1504628
1216908 1951709
1021619 1930030
15048 53617
1007458 1962726
388 869
1067361 1995135
46739 73395
1077614 1958584
1031957 1958036
72634 74120
1063961 1954135
274937 720742
1 8
1057393 1928164
28 93
1000000 2000000
1032073 1940559
1089798 1970754
100000 999999
1434814 1780702
1072799 1964898
1059864 1967119
155569 345355
2578 4229
72 95
552 553
601415 711929
1275115 1433162
688042 856983
594203 673506
1172008 1490273
180289 925799
354653 354653
1195595 1669305
5908 5967
2083 7987
339 670
1015539 1982001
29229 96206
1006167 1522491
1117646 1117646
1021936 1950412
41215 83080
1033077 1969448

Question 2

Sometimes the biggest gains come from changing the logic. Are you sure that you want to keep the logic? You need more laziness with the help of docs.python.org/library/itertools.html, in particular - islice. Generating an actual slice would be a bit expensive.

Question 3

Have you tried this method? At least it will prove to you what's taking the most time.

Question 4

if len(m) == len(n) and len(m) > 1 and len(n) > 1 ... If the first 2 expressions are true, the last one MUST be true.

Question 5

 f=open('C-large.in')

Don't sue single letter variable names, its really hard to read

 noi=int(f.readline().rstrip('\r\n'))

Since the default is already to remove whitespace, why both with the \r\n parameter?

 i=1
 ou=open('jam.out','w')
 while i<=noi:

Use for loops, not while loops to count

 re=f.readline().rstrip('\r\n')

re is a really bad name because its the same as the regular expression module

 rs=re.split()
 a=int(rs[0])
 b=int(rs[1])

I'd use a,b = map(int, re.split())

 ans=0
 c=[x for x in range(a,b+1)]

Use c = list(range(a,b+1))

 for ind,x in enumerate(c):
 for y in c[ind+1:]:

That's gonna be inefficient. Firstly, you can easily express it using simple ranges

for x in range(a, b+ 1):
 for y in range(x + 1, b + 1):

and therefore avoid the slices.

But you can do much better. You are only interested in values of y which are recyclings of x. Instead of generating the rather large number of possible second numbers, try just generating the fairly small number of possible ways to recycle x.

 n=str(x)
 m=str(y)

Converting all your numbers to strings will be expensive. See if you can keep them as integers and just do math operations on them

 if len(n)==len(m) and len(n)>1 and len(m)>1:
 ki=1 
 while ki <len(n):

Again, use for loops to count

 temp=n[-1:-(ki)-1:-1][::-1]+n[:-(ki)]

temp should be banned variable name. All variables are temporary, use a more helpful name.

 if temp[0]!='0':
 if temp==m:
 ans+=1
 break
 ki+=1 
 else :
 break
 print("Case #%s:"%(i),ans,file=ou)

Normally, we use %d for numbers

 i+=1
 ou.close()
 f.close()

Question 6

I applied all the changes you mentioned above and my program became faster by 15-20 seconds, but still not enough for the large input set.

Question 7

@AshwiniChaudhary, my versions solves the large data set in two minutes. I'm going to guess that you didn't actually make all the changes I suggested.

Question 8

In general, you're right about using cryptic/confusing variable names. However, f is a commonly used and widely accepted variable for a file object. If I saw f independent of it's context, my first assumption would be that it represented a file object.

Question 9

@JoelCornett, I personally still don't like f, but its common enough you can get away with it. The way I see it, its still preferable to use a longer name. Nevertheless, I should have picked on some of the other examples

Question 10

The correct solution here is to change the logic. Your code runs in O(n^2) (where n is b-a), but the correct solution runs in O(n d) (where d is the number of digits). Because d is most likely much smaller than n, this is much faster.

For more details, see the official analysis of the problem.

Question 11

I suspect the authors means that he needs to get the same answer when he says that he doesn't want to change the logic.

Question 12

@svick can you give me pythonic version of the solution provided by the codejam team for the large input set, because I don't know C much. I guess the logic I used for the my solution is exactly the same they mentioned for the small input set.

Question 13

You're calculating n and len(n) more times than you need to. I got a 25% speed increase simply by:

moving n=str(x) up into the for idx,x... loop
storing n's length in that same loop (eg. length_n = len(n))
replacing all the instaces of len(n) with length_n.

I'm sure that principle will apply to whichever solution you end up going with.

Question 14

the len(n)>1 can also be pushed up on level.

Question 15

Not sure if my comparison method is as efficient - I've not benchmarked the tow against each other, but I'd expect the rest of this code to be better. It would be more efficient to avoid the convenience function (extra function calls), however I've split it out this way to enhance readability, and make it easier to switch your comparison method.

As I say - the comparison is completely unoptimised, but then again I usually find that standard library objects are fairly efficient.

def main():
 """ Conventionally this it the function run when the script starts """
 from collections import Counter
 with open('C-large.in') as input_file: 
 # using 'with' is recommended as it will cause the file to be closed automatically
 # when you exit this block (think try/catch/finally)
 def is_recyclable(x,y):
 """ Convenience function for readability """
 X,Y = str(x), str(y)
 if len(X) == len(Y) and len(X) > 1:
 return Counter(X) == Counter(Y)
 return False
 with open('jam.out','w') as output_file:
 line_ending = '\r\n'
 i = 1
 for line in input_file: #This will do your readline calls for you. Not sure what your 'noi' was doing, it crashed for me...
 (a,b) = (int(x) for x in line.rstrip(line_ending).split())
 ans = 0
 for x in range(a, b+1): # In python 3.x this is a generator
 for y in range(x+1, b+1): # In python 2.x use xrange
 # You should now have a <= x < y <= b
 if is_recyclable(x,y):
 ans += 1
 output = "Case #%s: %s%s"%(i, ans, line_ending)
 #print output debug line
 output_file.write(output)
 i += 1
if __name__ == "__main__":
 main()

Question 16

ah I see I missinterpretted the comparisson - I'll have a look at that tomorrow

Winston Ewert Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2012-04-17 04:16:05Z

 f=open('C-large.in')

Don't sue single letter variable names, its really hard to read

 noi=int(f.readline().rstrip('\r\n'))

Since the default is already to remove whitespace, why both with the \r\n parameter?

 i=1
 ou=open('jam.out','w')
 while i<=noi:

Use for loops, not while loops to count

 re=f.readline().rstrip('\r\n')

re is a really bad name because its the same as the regular expression module

 rs=re.split()
 a=int(rs[0])
 b=int(rs[1])

I'd use a,b = map(int, re.split())

 ans=0
 c=[x for x in range(a,b+1)]

Use c = list(range(a,b+1))

 for ind,x in enumerate(c):
 for y in c[ind+1:]:

That's gonna be inefficient. Firstly, you can easily express it using simple ranges

for x in range(a, b+ 1):
 for y in range(x + 1, b + 1):

and therefore avoid the slices.

But you can do much better. You are only interested in values of y which are recyclings of x. Instead of generating the rather large number of possible second numbers, try just generating the fairly small number of possible ways to recycle x.

 n=str(x)
 m=str(y)

Converting all your numbers to strings will be expensive. See if you can keep them as integers and just do math operations on them

 if len(n)==len(m) and len(n)>1 and len(m)>1:
 ki=1 
 while ki <len(n):

Again, use for loops to count

 temp=n[-1:-(ki)-1:-1][::-1]+n[:-(ki)]

temp should be banned variable name. All variables are temporary, use a more helpful name.

 if temp[0]!='0':
 if temp==m:
 ans+=1
 break
 ki+=1 
 else :
 break
 print("Case #%s:"%(i),ans,file=ou)

Normally, we use %d for numbers

 i+=1
 ou.close()
 f.close()

I applied all the changes you mentioned above and my program became faster by 15-20 seconds, but still not enough for the large input set.
@AshwiniChaudhary, my versions solves the large data set in two minutes. I'm going to guess that you didn't actually make all the changes I suggested.
In general, you're right about using cryptic/confusing variable names. However, f is a commonly used and widely accepted variable for a file object. If I saw f independent of it's context, my first assumption would be that it represented a file object.
@JoelCornett, I personally still don't like f, but its common enough you can get away with it. The way I see it, its still preferable to use a longer name. Nevertheless, I should have picked on some of the other examples

Stack Exchange Network

"Recycled numbers" challenge

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

"Recycled numbers" challenge

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions