Return to Revisions

2 of 2

deleted 55 characters in body; edited title

edited Mar 21 at 10:52

15.1k
5
29
211

Calculating average scores from Excel file

I am new to Python from R. They are quite different. I wrote Python code to deal with Excel file. The code was running well, but I think it's kind of the style of R. I want somebody to show me writing the code in Python way.

The data structure looks like:

 Name Details
1 AAA first(100%-8)
2 BBB first(50%-8),second(50%-8)
3 CCC sixth(30%-8),seventh(60%-7),first(10%-7.75)
4 DDD third(100%-6)
5 EEE fifth(70%-7.5),second(30%-7.5)
6 FFF first(70%-8),ninth(30%-6.75)
... ... ..........

As you can see, Mr.first gave Mr.AAA 8 points with 100% weights. Otherwise, in the 2nd,3rd,6th row, Mr.first gave different people with different scores. So the average scores that Mr.first gave is (8+8+7.75+8)/4 = 7.94, which is the average score of his group.

What I am looking for is: for the Mr.AAA, his final score is not 8 * 100%,it's 8 * (7.5/7.94)*100%, where 7.5 is a constant and 7.94 is the average score of the group of Mr.first. Similarly, for the Mr.BBB, his final score is 8 * (7.5/7.94)*50% + 8 * (7.5/7.75)*50%. Hope you get it.

So, question is pretty simple.

My code:

#-*- coding:utf-8 -*-
import xlrd
import re
data = xlrd.open_workbook(filename) #read the data
table = data.sheets()[0] #read the sheet
nrows = table.nrows #get the number of total rows
regr = r'[\u4e00-\u9fa5a-zA-Z]+' # regular expression for CHN and ENG names
regr1 = r'[0-9]+' # for scores and percentage
score = {} # The dict: {AAA:{first:[1.0,8.0]},......}
group = {} # The dict: {first:[8,8,7.75,8],......}
for i in range(2,nrows):
 target = table.cell(i,10).value # the details data [first(100%-8)]
 person = table.cell(i,2).value # the Name data [AAA]
 c = target.split(',') # If in details data there are more than 
 # one person, then split them 
 score[person] = {} # set an empty dict
 for j in c:
 d = re.findall(regr,j) # get the name 
 d = "".join(d) # transfer the list to string
 value = re.findall(regr1,j) #get the score and percentage
 value1 = int(value[0])/100 # get the percentage 
 value2 = '.'.join([x for x in value[1:]]) # get the score
 value2 = float(value2) # change to float
 group.setdefault(d,[]).append(value2) 
 score[person].setdefault(d,[value1,value2]) 
#This part is for calculating the group average
for key in group:
 total = 0
 length = len(group[key])
 group[key] = [x for x in group[key]]
 for x in group[key]:
 total = total + x
 group[key] = total/length
output = {} #set an empty dict to store output: {AAA:7.56,......}
#this part is for calculating the final score
for key in score:
 average = 0
 for subkey in score[key]:
 average = score[key][subkey][0] * score[key][subkey][1] 
 *7.5/group[subkey] + average
 output[key] = average
 
print(output) #print the output

Finally here: for the reason of secrecy, I can not provide the raw data. But, if you need, I can create the raw data. I am here because I feel that my code is tedious. I hope someone can help me to write it more elegantly. It's for future work.

python python-3.x

asked Nov 29, 2016 at 8:56

helloswift123

lang-py