Optimizing/Cleaning up Python Script for Zipf's Law

Question 1

I have attached two scripts which calculate the most popular songs according to zipf's law from the following standard input:

6 3
100 one
50 two
10 three
30 four
5 five
10 six

The first line details the number of songs to be input followed by the number of songs to return. Each line that follows holds the play counts and respective song titles. In the case of a tie, the algorithm must give precedence to the song that appeared earlier on the album.

Both of my attempts to solve are currently not efficient enough. Any help in optimizing one or both of these would be greatly appreciated.

Attempt 1:

import sys
from operator import itemgetter
def main():
 line1 = sys.stdin.readline().split()
 x = int(line1[0])-1
 z = int(line1[1])
 data = ""
 for line in sys.stdin:
 data+=line
 data = data.split()
 while(x >= 0):
 data[2*x]= float(data[2*x]) * (x+1)
 x-=1
 answers = [(data[a+1], data[a]) for a in range(0,len(data),2)]
 answers.sort(key=itemgetter(1), reverse=True)
 p=0
 while(p<z):
 print answers[p][0]
 p+=1
main()

Attempt 2:

import sys
from operator import itemgetter
def main():
 line1 = sys.stdin.readline().split()
 x = int(line1[0])-1
 p = x
 z = int(line1[1])
 data = ""
 for line in sys.stdin:
 data+=line
 data = data.split()
 while(x >= 0):
 data[2*x]= float(data[2*x]) * (x+1)
 x-=1
 y=0
 list_of_lists = []
 while(y<=p):
 list_of_lists.append([data[2*y+1], data[2*y]])
 y+=1
 list_of_lists = sorted(list_of_lists, key=itemgetter(1), reverse=True)
 n=0
 while(n<z):
 print list_of_lists[n][0]
 n+=1
main()

Currently Cprofile reports are indicating that the first script is faster. Thanks!

Question 2

I must confess that I do not understand what you're trying to achieve and how Zipf's law come into play ! But I've some suggestions about the code itself :

Use meaningful variable names.
Concatenating strings only to split the result afterward is terribly costly, for no advantage.
Instead, use meaningful data structures. You deal with a sequence of (count, title) pairs. Flattening that and playing with index parity is confusing.

So, here is my take on it :

import sys
from operator import itemgetter
def main(source = sys.stdin):
 nb_in, nb_out = (int(x) for x in source.readline().split())
 #(lazely) collect Count and Title.
 data = ( source.readline().split() for i in range(nb_in) )
 #multiply each count by rank.
 answers = [ (title, int(count)*(rank+1)) for (rank, (count, title)) in enumerate(data) ]
 answers.sort(key = itemgetter(1), reverse = True)
 for title, _ in answers[:nb_out]:
 print title
main()

YvesgereY YvesgereY 4583 silver badges12 bronze badges · Answer 1 · 2013-12-24 13:08:06Z

I must confess that I do not understand what you're trying to achieve and how Zipf's law come into play ! But I've some suggestions about the code itself :

Use meaningful variable names.
Concatenating strings only to split the result afterward is terribly costly, for no advantage.
Instead, use meaningful data structures. You deal with a sequence of (count, title) pairs. Flattening that and playing with index parity is confusing.

So, here is my take on it :

import sys
from operator import itemgetter
def main(source = sys.stdin):
 nb_in, nb_out = (int(x) for x in source.readline().split())
 #(lazely) collect Count and Title.
 data = ( source.readline().split() for i in range(nb_in) )
 #multiply each count by rank.
 answers = [ (title, int(count)*(rank+1)) for (rank, (count, title)) in enumerate(data) ]
 answers.sort(key = itemgetter(1), reverse = True)
 for title, _ in answers[:nb_out]:
 print title
main()

Stack Exchange Network

Optimizing/Cleaning up Python Script for Zipf's Law

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimizing/Cleaning up Python Script for Zipf's Law

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions