1

Hi here is my problem. I have a program that calulcates the averages of data in columns. Example

Bob
1
2
3

the output is

Bob
2

Some of the data has 'na's So for Joe

Joe
NA
NA
NA

I want this output to be NA

so I wrote an if else loop

The problem is that it doesn't execute the second part of the loop and just prints out one NA. Any suggestions?

Here is my program:

with open('C://achip.txt', "rtU") as f:
 columns = f.readline().strip().split(" ")
 numRows = 0
 sums = [0] * len(columns)
 numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
 for line in f:
 # Skip empty lines since I was getting that error before
 if not line.strip():
 continue
 values = line.split(" ")
 for i in xrange(len(values)):
 try: # this is the whole strings to math numbers things
 sums[i] += float(values[i])
 numRowsPerColumn[i] += 1
 except ValueError:
 continue 
 with open('c://chipdone.txt', 'w') as ouf:
 for i in xrange(len(columns)):
 if numRowsPerColumn[i] ==0 :
 print 'NA' 
 else:
 print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator

The file looks like so:

Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1 NA

and final output is the names and the averages

Joe Bob Sam 
1.5 1.5 NA

Ok I tried Roger's suggestion and now I have this error:

Traceback (most recent call last): File "C:/avy14.py", line 5, in for line in f: ValueError: I/O operation on closed file

Here is this new code:

with open('C://achip.txt', "rtU") as f: columns = f.readline().strip().split(" ") sums = [0] * len(columns) rows = 0 for line in f: line = line.strip() if not line: continue

rows += 1 for col, v in enumerate(line.split()): if sums[col] is not None: if v == "NA": sums[col] = None else: sums[col] += int(v)

with open("c:/chipdone.txt", "w") as out: for name, sum in zip(columns, sums): print>>out, name, if sum is None: print>>out, "NA" else: print>>out, sum / rows

asked Sep 24, 2010 at 14:49
3
  • Use "C:\\file" or "c:/file", with the latter usually preferred; Using "//" will be interpreted incorrectly in many cases (just not in this exact one). Commented Sep 24, 2010 at 14:59
  • Could you paste an example of what the source file looks like, and a sample of what the complete output should look like? Commented Sep 24, 2010 at 15:00
  • ...and also, could you include the code of the "second part of the loop"? The code provided only contains two alternative instructions (if/else)... Commented Sep 24, 2010 at 15:03

4 Answers 4

1
with open("c:/achip.txt", "rU") as f:
 columns = f.readline().strip().split()
 sums = [0.0] * len(columns)
 row_counts = [0] * len(columns)
 for line in f:
 line = line.strip()
 if not line:
 continue
 for col, v in enumerate(line.split()):
 if v != "NA":
 sums[col] += int(v)
 row_counts[col] += 1
with open("c:/chipdone.txt", "w") as out:
 for name, sum, rows in zip(columns, sums, row_counts):
 print >>out, name,
 if rows == 0:
 print >>out, "NA"
 else:
 print >>out, sum / rows

I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).

Regarding your edit to include input/output sample, I kept your original format and my output would be:

Joe 1.75
Bob 2.33333333333
Sam NA

This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)

answered Sep 24, 2010 at 15:06
Sign up to request clarification or add additional context in comments.

4 Comments

@Robert: The code you included in your edit is misindented with the for loop outside of the with, closing the file before the for loop runs. Updated my code to show what I mean.
@Robert: I also see that the code I wrote (before you included the example) is wrong, as I misinterpreted you. Fixed.
Still not working Roger. Now when i have a name like Joe 2 NA 1....the final value should be 1.5 and it outputs as NA
@Robert: Using 0.0 instead of 0 for sums (so floating point is used) and I get Joe 1.75, Bob 2.333.., Sam NA for the input sample you gave in the question. These values match what I figure out by hand.
0

Using numpy:

import numpy as np
with open('achip.txt') as f:
 names=f.readline().split()
 arr=np.genfromtxt(f)
print(arr)
# [[ 1. 2. NaN]
# [ 2. 4. NaN]
# [ 3. NaN NaN]
# [ 1. 1. NaN]]
print(names)
# ['Joe', 'Bob', 'Sam']
print(np.ma.mean(np.ma.masked_invalid(arr),axis=0))
# [1.75 2.33333333333 --]
answered Sep 24, 2010 at 15:28

Comments

0

Using your original code, I would add one loop and edit the print statement

 with open(r'C:\achip.txt', "rtU") as f:
 columns = f.readline().strip().split(" ")
 numRows = 0
 sums = [0] * len(columns)
 numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
 for line in f:
 # Skip empty lines since I was getting that error before
 if not line.strip():
 continue
 values = line.split(" ")
 ### This removes any '' elements caused by having two spaces like
 ### in the last line of your example chip file above
 for count, v in enumerate(values): 
 if v == '': 
 values.pop(count)
 ### (End of Addition)
 for i in xrange(len(values)):
 try: # this is the whole strings to math numbers things
 sums[i] += float(values[i])
 numRowsPerColumn[i] += 1
 except ValueError:
 continue 
 with open('c://chipdone.txt', 'w') as ouf:
 for i in xrange(len(columns)):
 if numRowsPerColumn[i] ==0 :
 print>>ouf, columns[i], 'NA' #Just add the extra parts
 else:
 print>>ouf, columns[i], sums[i] / numRowsPerColumn[i]

This solution also gives the same result in Roger's format, not your intended format.

answered Sep 24, 2010 at 16:22

Comments

0

Solution below is cleaner and has fewer lines of code ...

import pandas as pd
# read the file into a DataFrame using read_csv
df = pd.read_csv('C://achip.txt', sep="\s+")
# compute the average of each column
avg = df.mean()
# save computed average to output file
avg.to_csv("c:/chipdone.txt")

They key to the simplicity of this solution is the way the input text file is read into a Dataframe. Pandas read_csv allows you to use regular expressions for specifying the sep/delimiter argument. In this case, we used the "\s+" regex pattern to take care of having one or more spaces between columns.

Once the data is in a dataframe, computing the average and saving to a file can all be done with straight forward pandas functions.

answered Jan 2, 2019 at 13:02

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.