7
\$\begingroup\$

This is my first program in Python and first time I'm writing in dynamically typed language. I have substituted real detectPeaks function with a simple placeholer. I'm aware of several problems with this program.

I still don't quite know how to 'live with' Python scoping, hence the qFixfunction. I didn't bother to find a better name since I want to rewrite whole program.

Doing anything outside of any function seems unclean to me. It seems that it is ok in Python. Am I right? # It did allow me to have global cmdArgs object witch makes logic connected to cmd arguments clean.

I'm also concerned with naming functions and variables and structure of the program. The program does not have to be very flexible I would get away with hardcoding most of the things but prefer not unless it would complicate code significantly.

__version__ = dirt3
import sys, os.path, argparse, glob
import numpy as np
def detectPeaks(in_list):
 in_list = [s.replace(',', '.') for s in in_list]
 in_list.pop() # remove "\r\n"
 X = np.array(map(float, in_list[:2]))
 Y = np.array(map(float, in_list[2:]))
 XminIndex = np.argmin(X)
 YminIndex = np.argmin(Y)
 val = X[XminIndex] + Y[YminIndex]
 return (XminIndex, YminIndex, val)
def processFileLine(line):
 No, time, A, B = line.split('\t')[0], line.split('\t')[1], line.split("\t")[2::2], line.split("\t")[3::2]
 if cmdArgs.A:
 retA = detectPeaks(A)
 ansA = No + '\t' + time + '\t' + str(retA[0]) + '\t' + str(retA[1]) + '\t' + str(retA[2]) + '\n'
 else:
 ansA = "" #to get read of Unbound local var err
 if cmdArgs.B:
 retB = detectPeaks(B)
 ansB = No + '\t' + time + '\t' + str(retB[0]) + '\t' + str(retB[1]) + '\t' + str(retB[2]) + '\n'
 else:
 ansB = ""
 return ansA + ansB
def mkOutFilePath(outDirPath, inFilePath):
 inFilename = os.path.basename(inFilePath)
 return os.path.join(outDirPath, inFilename + ".peak")
def qFix(inFilePath):
 if inFilePath == '-':
 return sys.stdin
 else:
 return open(inFilePath, "r")
def processFile(inFilePath):
 '''
 if inFilePath == '-':
 inFile = sys.stdin
 else:
 inFIle = open(inFilePath, "r")
 '''
 inFile = qFix(inFilePath)
 if cmdArgs.outFiles == '-':
 outFile = sys.stdout
 elif os.path.isfile(cmdArgs.outFiles):
 outFile = open(cmdArgs.outFiles, "a")
 else:
 outFile = open(mkOutFilePath(cmdArgs.outFiles, inFilePath), "w")
 for line in inFile.readlines():
 if line[0] == '#':
 outFile.write(line)
 elif line[0] == 'N' and line[1] == 'r' :
 outFile.write("Nr\tTime\tX\tY\tValue\n")
 else:
 outFile.write(processFileLine(line))
def main():
 if cmdArgs.inFiles == '-':
 processFile('-')
 else:
 if os.path.isfile(cmdArgs.inFiles):
 filesList = [cmdArgs.inFiles]
 else:
 if cmdArgs.recuresive:
 filesList = [y for x in os.walk(cmdArgs.inFiles) for y in glob.glob(os.path.join(x[0], '*.dat'))]
 else:
 filesList = glob.glob(os.path.join(cmdArgs.inFiles, '*.dat'))
 for filePath in filesList:
 processFile(filePath)
def checkPath(path):
 if (path != '-') and not os.path.exists(path):
 exit(2)
 return path
if __name__ == "__main__":
 cmdArgsParser = argparse.ArgumentParser()
 cmdArgsParser.add_argument('inFiles', nargs='?', default='-', type=checkPath, help="Path to input file(s).")
 cmdArgsParser.add_argument('outFiles', nargs='?', default='-', type=checkPath, help="Path to output file(s).")
 cmdArgsParser.add_argument("-d", "--delete", action="store_true", help="Delete input files after procesing.")
 cmdArgsParser.add_argument("-r", "--recuresive", action="store_true", help="Operate recuresively.")
 cmdArgsParser.add_argument("-A", action="store_true", help="Use data from A set.")
 cmdArgsParser.add_argument("-B", action="store_true", help="Use data from B set.")
 cmdArgsParser.add_argument("-V", "--version", action="version", version=__version__)
 cmdArgs = cmdArgsParser.parse_args()
 main()

Files read by the program have comments that are to be copied to the output. Example file(simplified, there is lots of data in real ones):

# Comments
Nr: Time 1A 1B 2A 2B 3A 3B 4A 4B
1 2015 0,10 0,10 0,10 0,10 0,10 0,10 0,10 0,10
2 2015 0,10 0,10 0,10 0,10 0,10 0,10 0,10 0,10
3 2015 0,10 0,10 0,10 0,10 0,10 0,10 0,10 0,10
4 2015 0,10 0,10 0,10 -0,10 0,10 0,10 0,10 0,10
5 2015 0,10 0,10 0,10 0,10 0,10 0,10 0,10 0,10
6 2015 0,10 0,10 0,10 0,15 0,10 0,10 0,10 0,10
7 2015 0,10 0,20 0,30 0,20 0,10 0,10 0,10 0,10
8 2015 -0,10 0,10 0,10 0,10 -0,10 0,10 0,10 0,10

Output for sample file:

# Commests
Nr Time X Y Value
1 2015 0 0 0.2
2 2015 0 0 0.2
3 2015 0 0 0.2
4 2015 0 0 0.2
5 2015 0 0 0.2
6 2015 0 0 0.2
7 2015 0 0 0.2
8 2015 0 0 -0.2
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jul 15, 2015 at 11:49
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

Some suggestions:

  1. Follow the pep8 style guide.
  2. In detectPeaks, you should slice in_list initially, rather than popping. So in_list[:-1].
  3. In detectPeaks, you should convert to float in the initial list comprehension.
  4. In detectPeaks, you should convert to a single numpy array, then slice that to get X and Y.
  5. In detectPeaks, you don't need to wrap the return in ( )
  6. In processFileLine, don't put multiple operations on a single line like that.
  7. In processFileLine, you should do the splitting once, then get the items from that.
  8. In processFileLine, you should use string replacement. So, for example, ansA = '{}\t{}\t{}\t{}\t{}\n'.format(No, time, *retA). Or better yet you can define a string ansform = '{}\t{}\t{}\t{}\t{}\n' at the beginning, then apply the format to it in each case, so ansA = 'ansform.format(No, time, *retA) and ansB = 'ansform.format(No, time, *retB).
  9. Use with to automatically open and close files when you are done with them.
  10. When looping over a file object, you don't need readlines. Just do, for example, for line in inFile. That will automatically loop over all the lines in the file.
  11. In processFile, you should test for a slice of a string. So elif line[:2] == 'Nr':
  12. In the loop of processFile, it would be better to define a string in the if...elif...else block, and then after the block write that string.
  13. Never, ever, under any circumstances call exit. If you need to exit, either allow the program to exit normally or throw an exception. In your case, raise an exception.
  14. You should put the argument parsing in a function.
  15. In main, you should return at the end of the if cmdArgs.inFiles == '-': block, which will allow you to avoid the else case. Or better yet, do if cmdArgs.inFiles == '-' or os.path.isfile(cmdArgs.inFiles):, since filesList = [cmdArgs.inFiles] will work in both cases.
  16. In main, you should do elif cmdArgs.recuresive (which is miss-spelled, by the way).
  17. In main, the list comprehension should be a generator expression. This will allow you to avoid having to store all the filenames.
answered Jul 17, 2015 at 7:53
\$\endgroup\$
3
\$\begingroup\$

Spelling counts

The usability of your programme suffers if a flag is named recuresive, delete the second e.

Repeat again

You repeated a function in the docs of another function, why?

Main should take args

Main should take the command line flags as args and those should appear only in main, as it performs IO and other functions logic.

Automatically closing files

Use with to automatically close files. Not the manual open. open may leave the file open if an exception occurs.

answered Jul 15, 2015 at 12:31
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.