I have attached two scripts which calculate the most popular songs according to zipf's law from the following standard input:
6 3
100 one
50 two
10 three
30 four
5 five
10 six
The first line details the number of songs to be input followed by the number of songs to return. Each line that follows holds the play counts and respective song titles. In the case of a tie, the algorithm must give precedence to the song that appeared earlier on the album.
Both of my attempts to solve are currently not efficient enough. Any help in optimizing one or both of these would be greatly appreciated.
Attempt 1:
import sys
from operator import itemgetter
def main():
line1 = sys.stdin.readline().split()
x = int(line1[0])-1
z = int(line1[1])
data = ""
for line in sys.stdin:
data+=line
data = data.split()
while(x >= 0):
data[2*x]= float(data[2*x]) * (x+1)
x-=1
answers = [(data[a+1], data[a]) for a in range(0,len(data),2)]
answers.sort(key=itemgetter(1), reverse=True)
p=0
while(p<z):
print answers[p][0]
p+=1
main()
Attempt 2:
import sys
from operator import itemgetter
def main():
line1 = sys.stdin.readline().split()
x = int(line1[0])-1
p = x
z = int(line1[1])
data = ""
for line in sys.stdin:
data+=line
data = data.split()
while(x >= 0):
data[2*x]= float(data[2*x]) * (x+1)
x-=1
y=0
list_of_lists = []
while(y<=p):
list_of_lists.append([data[2*y+1], data[2*y]])
y+=1
list_of_lists = sorted(list_of_lists, key=itemgetter(1), reverse=True)
n=0
while(n<z):
print list_of_lists[n][0]
n+=1
main()
Currently Cprofile reports are indicating that the first script is faster. Thanks!
1 Answer 1
I must confess that I do not understand what you're trying to achieve and how Zipf's law come into play ! But I've some suggestions about the code itself :
- Use meaningful variable names.
- Concatenating strings only to split the result afterward is terribly costly, for no advantage.
- Instead, use meaningful data structures. You deal with a sequence of
(count, title)
pairs. Flattening that and playing with index parity is confusing.
So, here is my take on it :
import sys
from operator import itemgetter
def main(source = sys.stdin):
nb_in, nb_out = (int(x) for x in source.readline().split())
#(lazely) collect Count and Title.
data = ( source.readline().split() for i in range(nb_in) )
#multiply each count by rank.
answers = [ (title, int(count)*(rank+1)) for (rank, (count, title)) in enumerate(data) ]
answers.sort(key = itemgetter(1), reverse = True)
for title, _ in answers[:nb_out]:
print title
main()