Help with an if else loop in python

Question 1

Hi here is my problem. I have a program that calulcates the averages of data in columns. Example

Bob
1
2
3

the output is

Bob
2

Some of the data has 'na's So for Joe

Joe
NA
NA
NA

I want this output to be NA

so I wrote an if else loop

The problem is that it doesn't execute the second part of the loop and just prints out one NA. Any suggestions?

Here is my program:

with open('C://achip.txt', "rtU") as f:
 columns = f.readline().strip().split(" ")
 numRows = 0
 sums = [0] * len(columns)
 numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
 for line in f:
 # Skip empty lines since I was getting that error before
 if not line.strip():
 continue
 values = line.split(" ")
 for i in xrange(len(values)):
 try: # this is the whole strings to math numbers things
 sums[i] += float(values[i])
 numRowsPerColumn[i] += 1
 except ValueError:
 continue 
 with open('c://chipdone.txt', 'w') as ouf:
 for i in xrange(len(columns)):
 if numRowsPerColumn[i] ==0 :
 print 'NA' 
 else:
 print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator

The file looks like so:

Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1 NA

and final output is the names and the averages

Joe Bob Sam 
1.5 1.5 NA

Ok I tried Roger's suggestion and now I have this error:

Traceback (most recent call last): File "C:/avy14.py", line 5, in for line in f: ValueError: I/O operation on closed file

Here is this new code:

with open('C://achip.txt', "rtU") as f: columns = f.readline().strip().split(" ") sums = [0] * len(columns) rows = 0 for line in f: line = line.strip() if not line: continue

rows += 1 for col, v in enumerate(line.split()): if sums[col] is not None: if v == "NA": sums[col] = None else: sums[col] += int(v)

with open("c:/chipdone.txt", "w") as out: for name, sum in zip(columns, sums): print>>out, name, if sum is None: print>>out, "NA" else: print>>out, sum / rows

Question 2

Use "C:\\file" or "c:/file", with the latter usually preferred; Using "//" will be interpreted incorrectly in many cases (just not in this exact one).

Question 3

Could you paste an example of what the source file looks like, and a sample of what the complete output should look like?

Question 4

...and also, could you include the code of the "second part of the loop"? The code provided only contains two alternative instructions (if/else)...

Question 5

with open("c:/achip.txt", "rU") as f:
 columns = f.readline().strip().split()
 sums = [0.0] * len(columns)
 row_counts = [0] * len(columns)
 for line in f:
 line = line.strip()
 if not line:
 continue
 for col, v in enumerate(line.split()):
 if v != "NA":
 sums[col] += int(v)
 row_counts[col] += 1
with open("c:/chipdone.txt", "w") as out:
 for name, sum, rows in zip(columns, sums, row_counts):
 print >>out, name,
 if rows == 0:
 print >>out, "NA"
 else:
 print >>out, sum / rows

I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).

Regarding your edit to include input/output sample, I kept your original format and my output would be:

Joe 1.75
Bob 2.33333333333
Sam NA

This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)

Question 6

@Robert: The code you included in your edit is misindented with the for loop outside of the with, closing the file before the for loop runs. Updated my code to show what I mean.

Question 7

@Robert: I also see that the code I wrote (before you included the example) is wrong, as I misinterpreted you. Fixed.

Question 8

Still not working Roger. Now when i have a name like Joe 2 NA 1....the final value should be 1.5 and it outputs as NA

Question 9

@Robert: Using 0.0 instead of 0 for sums (so floating point is used) and I get Joe 1.75, Bob 2.333.., Sam NA for the input sample you gave in the question. These values match what I figure out by hand.

Question 10

Using numpy:

import numpy as np
with open('achip.txt') as f:
 names=f.readline().split()
 arr=np.genfromtxt(f)
print(arr)
# [[ 1. 2. NaN]
# [ 2. 4. NaN]
# [ 3. NaN NaN]
# [ 1. 1. NaN]]
print(names)
# ['Joe', 'Bob', 'Sam']
print(np.ma.mean(np.ma.masked_invalid(arr),axis=0))
# [1.75 2.33333333333 --]

Question 11

Using your original code, I would add one loop and edit the print statement

 with open(r'C:\achip.txt', "rtU") as f:
 columns = f.readline().strip().split(" ")
 numRows = 0
 sums = [0] * len(columns)
 numRowsPerColumn = [0] * len(columns) # this figures out the number of columns
 for line in f:
 # Skip empty lines since I was getting that error before
 if not line.strip():
 continue
 values = line.split(" ")
 ### This removes any '' elements caused by having two spaces like
 ### in the last line of your example chip file above
 for count, v in enumerate(values): 
 if v == '': 
 values.pop(count)
 ### (End of Addition)
 for i in xrange(len(values)):
 try: # this is the whole strings to math numbers things
 sums[i] += float(values[i])
 numRowsPerColumn[i] += 1
 except ValueError:
 continue 
 with open('c://chipdone.txt', 'w') as ouf:
 for i in xrange(len(columns)):
 if numRowsPerColumn[i] ==0 :
 print>>ouf, columns[i], 'NA' #Just add the extra parts
 else:
 print>>ouf, columns[i], sums[i] / numRowsPerColumn[i]

This solution also gives the same result in Roger's format, not your intended format.

Question 12

Solution below is cleaner and has fewer lines of code ...

import pandas as pd
# read the file into a DataFrame using read_csv
df = pd.read_csv('C://achip.txt', sep="\s+")
# compute the average of each column
avg = df.mean()
# save computed average to output file
avg.to_csv("c:/chipdone.txt")

They key to the simplicity of this solution is the way the input text file is read into a Dataframe. Pandas read_csv allows you to use regular expressions for specifying the sep/delimiter argument. In this case, we used the "\s+" regex pattern to take care of having one or more spaces between columns.

Once the data is in a dataframe, computing the average and saving to a file can all be done with straight forward pandas functions.

Roger Pate · Accepted Answer · 2010-09-24 15:06:57Z

with open("c:/achip.txt", "rU") as f:
 columns = f.readline().strip().split()
 sums = [0.0] * len(columns)
 row_counts = [0] * len(columns)
 for line in f:
 line = line.strip()
 if not line:
 continue
 for col, v in enumerate(line.split()):
 if v != "NA":
 sums[col] += int(v)
 row_counts[col] += 1
with open("c:/chipdone.txt", "w") as out:
 for name, sum, rows in zip(columns, sums, row_counts):
 print >>out, name,
 if rows == 0:
 print >>out, "NA"
 else:
 print >>out, sum / rows

I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).

Regarding your edit to include input/output sample, I kept your original format and my output would be:

Joe 1.75
Bob 2.33333333333
Sam NA

This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)

@Robert: The code you included in your edit is misindented with the for loop outside of the with, closing the file before the for loop runs. Updated my code to show what I mean.
@Robert: I also see that the code I wrote (before you included the example) is wrong, as I misinterpreted you. Fixed.
Still not working Roger. Now when i have a name like Joe 2 NA 1....the final value should be 1.5 and it outputs as NA
@Robert: Using 0.0 instead of 0 for sums (so floating point is used) and I get Joe 1.75, Bob 2.333.., Sam NA for the input sample you gave in the question. These values match what I figure out by hand.

CollectivesTM on Stack Overflow

Help with an if else loop in python

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related