I wrote this program for the Google Code Jam problem, "Recycled numbers."
It took around 2 and half minutes to solve the small input set on my dual core, 1GB RAM system. But for the large input set it took around 25 minutes(processor usage was 100% for those 25 mins) to solve the problem.
Problem
Let's say a pair of distinct positive integers (\$n\,ドル \$m\$) is recycled if you can obtain m by moving some digits from the back of n to the front without changing their order.
For example, (12345, 34512) is a recycled pair since you can obtain 34512 by moving 345 from the end of 12345 to the front. Note that n and m must have the same number of digits in order to be a recycled pair. Neither n nor m can have leading zeros.
Given integers \$A\$ and \$B\$ with the same number of digits and no leading zeros, how many distinct recycled pairs (\$n\,ドル \$m\$) are there with \$A ≤ n < > m ≤ B\$?
How can I make my program fast enough to solve the large input set within 8 minutes?
f=open('C-large.in')
noi=int(f.readline().rstrip('\r\n'))
i=1
ou=open('jam.out','w')
while i<=noi:
re=f.readline().rstrip('\r\n')
rs=re.split()
a=int(rs[0])
b=int(rs[1])
ans=0
c=[x for x in range(a,b+1)]
for ind,x in enumerate(c):
for y in c[ind+1:]:
n=str(x)
m=str(y)
if len(n)==len(m) and len(n)>1 and len(m)>1:
ki=1
while ki <len(n):
temp=n[-1:-(ki)-1:-1][::-1]+n[:-(ki)]
if temp[0]!='0':
if temp==m:
ans+=1
break
ki+=1
else :
break
print("Case #%s:"%(i),ans,file=ou)
i+=1
ou.close()
f.close()
Small input:
50 156 927 167 978 484 954 160 990 135 916 160 939 103 914 173 958 100 101 124 945 181 976 101 932 101 935 170 928 171 984 126 973 283 788 667 667 384 750 153 934 173 993 185 930 100 100 178 993 21 31 178 970 168 958 170 992 104 989 34 70 162 935 289 289 185 993 10 99 120 953 184 917 179 936 108 967 6 7 160 958 115 925 137 972 188 921 10 99 118 966 100 972 167 944 136 983 107 926 118 996
Large input:
50 10000 10001 1781 7688 1007465 1919027 100000 100000 1032122 1995680 1216257 1504628 1216908 1951709 1021619 1930030 15048 53617 1007458 1962726 388 869 1067361 1995135 46739 73395 1077614 1958584 1031957 1958036 72634 74120 1063961 1954135 274937 720742 1 8 1057393 1928164 28 93 1000000 2000000 1032073 1940559 1089798 1970754 100000 999999 1434814 1780702 1072799 1964898 1059864 1967119 155569 345355 2578 4229 72 95 552 553 601415 711929 1275115 1433162 688042 856983 594203 673506 1172008 1490273 180289 925799 354653 354653 1195595 1669305 5908 5967 2083 7987 339 670 1015539 1982001 29229 96206 1006167 1522491 1117646 1117646 1021936 1950412 41215 83080 1033077 1969448
4 Answers 4
f=open('C-large.in')
Don't sue single letter variable names, its really hard to read
noi=int(f.readline().rstrip('\r\n'))
Since the default is already to remove whitespace, why both with the \r\n
parameter?
i=1
ou=open('jam.out','w')
while i<=noi:
Use for loops, not while loops to count
re=f.readline().rstrip('\r\n')
re is a really bad name because its the same as the regular expression module
rs=re.split()
a=int(rs[0])
b=int(rs[1])
I'd use a,b = map(int, re.split())
ans=0
c=[x for x in range(a,b+1)]
Use c = list(range(a,b+1))
for ind,x in enumerate(c):
for y in c[ind+1:]:
That's gonna be inefficient. Firstly, you can easily express it using simple ranges
for x in range(a, b+ 1):
for y in range(x + 1, b + 1):
and therefore avoid the slices.
But you can do much better. You are only interested in values of y which are recyclings of x. Instead of generating the rather large number of possible second numbers, try just generating the fairly small number of possible ways to recycle x.
n=str(x)
m=str(y)
Converting all your numbers to strings will be expensive. See if you can keep them as integers and just do math operations on them
if len(n)==len(m) and len(n)>1 and len(m)>1:
ki=1
while ki <len(n):
Again, use for loops to count
temp=n[-1:-(ki)-1:-1][::-1]+n[:-(ki)]
temp should be banned variable name. All variables are temporary, use a more helpful name.
if temp[0]!='0':
if temp==m:
ans+=1
break
ki+=1
else :
break
print("Case #%s:"%(i),ans,file=ou)
Normally, we use %d
for numbers
i+=1
ou.close()
f.close()
-
\$\begingroup\$ I applied all the changes you mentioned above and my program became faster by 15-20 seconds, but still not enough for the large input set. \$\endgroup\$Ashwini Chaudhary– Ashwini Chaudhary2012年04月20日 07:51:25 +00:00Commented Apr 20, 2012 at 7:51
-
\$\begingroup\$ @AshwiniChaudhary, my versions solves the large data set in two minutes. I'm going to guess that you didn't actually make all the changes I suggested. \$\endgroup\$Winston Ewert– Winston Ewert2012年04月20日 13:46:29 +00:00Commented Apr 20, 2012 at 13:46
-
\$\begingroup\$ In general, you're right about using cryptic/confusing variable names. However,
f
is a commonly used and widely accepted variable for a file object. If I sawf
independent of it's context, my first assumption would be that it represented a file object. \$\endgroup\$Joel Cornett– Joel Cornett2012年04月29日 14:28:40 +00:00Commented Apr 29, 2012 at 14:28 -
\$\begingroup\$ @JoelCornett, I personally still don't like
f
, but its common enough you can get away with it. The way I see it, its still preferable to use a longer name. Nevertheless, I should have picked on some of the other examples \$\endgroup\$Winston Ewert– Winston Ewert2012年05月04日 04:12:10 +00:00Commented May 4, 2012 at 4:12
The correct solution here is to change the logic. Your code runs in O(n^2) (where n is b-a), but the correct solution runs in O(n d) (where d is the number of digits). Because d is most likely much smaller than n, this is much faster.
For more details, see the official analysis of the problem.
-
\$\begingroup\$ I suspect the authors means that he needs to get the same answer when he says that he doesn't want to change the logic. \$\endgroup\$Winston Ewert– Winston Ewert2012年04月17日 21:06:25 +00:00Commented Apr 17, 2012 at 21:06
-
\$\begingroup\$ @svick can you give me pythonic version of the solution provided by the codejam team for the large input set, because I don't know C much. I guess the logic I used for the my solution is exactly the same they mentioned for the small input set. \$\endgroup\$Ashwini Chaudhary– Ashwini Chaudhary2012年04月20日 07:42:55 +00:00Commented Apr 20, 2012 at 7:42
You're calculating n
and len(n)
more times than you need to. I got a 25% speed increase simply by:
- moving
n=str(x)
up into thefor idx,x...
loop - storing
n
's length in that same loop (eg.length_n = len(n)
) - replacing all the
instaces of
len(n)
withlength_n
.
I'm sure that principle will apply to whichever solution you end up going with.
-
\$\begingroup\$ the
len(n)>1
can also be pushed up on level. \$\endgroup\$miracle173– miracle1732012年04月26日 05:57:41 +00:00Commented Apr 26, 2012 at 5:57
Not sure if my comparison method is as efficient - I've not benchmarked the tow against each other, but I'd expect the rest of this code to be better. It would be more efficient to avoid the convenience function (extra function calls), however I've split it out this way to enhance readability, and make it easier to switch your comparison method.
As I say - the comparison is completely unoptimised, but then again I usually find that standard library objects are fairly efficient.
def main():
""" Conventionally this it the function run when the script starts """
from collections import Counter
with open('C-large.in') as input_file:
# using 'with' is recommended as it will cause the file to be closed automatically
# when you exit this block (think try/catch/finally)
def is_recyclable(x,y):
""" Convenience function for readability """
X,Y = str(x), str(y)
if len(X) == len(Y) and len(X) > 1:
return Counter(X) == Counter(Y)
return False
with open('jam.out','w') as output_file:
line_ending = '\r\n'
i = 1
for line in input_file: #This will do your readline calls for you. Not sure what your 'noi' was doing, it crashed for me...
(a,b) = (int(x) for x in line.rstrip(line_ending).split())
ans = 0
for x in range(a, b+1): # In python 3.x this is a generator
for y in range(x+1, b+1): # In python 2.x use xrange
# You should now have a <= x < y <= b
if is_recyclable(x,y):
ans += 1
output = "Case #%s: %s%s"%(i, ans, line_ending)
#print output debug line
output_file.write(output)
i += 1
if __name__ == "__main__":
main()
-
\$\begingroup\$ ah I see I missinterpretted the comparisson - I'll have a look at that tomorrow \$\endgroup\$theheadofabroom– theheadofabroom2012年04月20日 18:33:24 +00:00Commented Apr 20, 2012 at 18:33
Explore related questions
See similar questions with these tags.
if len(m) == len(n) and len(m) > 1 and len(n) > 1
... If the first 2 expressions are true, the last one MUST be true. \$\endgroup\$