1
\$\begingroup\$

I'm a student working as a research assistant and wrote this script to automate density functional theory HPC tasks with SLURM. When a calculation is complete the script checks the contents of a log file for the total force, and if it's above the desired threshold it generates a new input file, file.in.new, with relaxed atomic positions from the log which is passed to another script in the automation process. Please point out any issues with formatting, syntax, etc, and if there's anything I could do to simplify things.

Example of use: python generate.py file.log file.in

from sys import argv
def arg_parse(argv):
 try:
 log_file=argv[1]
 input_file=argv[2]
 override=False
 except IndexError as e:
 raise SystemExit
 if len(argv)==4 and argv[3].strip("-")=='o':
 override=True
 scan_and_write(log_file, input_file, override)
#---------------------------------------------------------------------
def scan_and_write(log_file, input_file, override):
 with open(log_file, 'r+') as log:
 total_force=[float(line.split()[3]) for line in log if line.rfind('Total force =') != -1][-1]
 tolerance(total_force, override, input_file)
 log.seek(0)
 total_cycles=sum([1 for line in log if line.rfind('ATOMIC_POSITIONS (crystal)') != -1])
 log.seek(0)
 index=[int(line.split("=")[1]) for line in log if ("number of atoms/cell" in line)][0]
 log.seek(0)
 for line in log:
 if line.rfind('ATOMIC_POSITIONS (crystal)') != -1:
 atomic_positions=[log.readline().split() for i in range(index)]
 new_input=open(input_file.replace('.in', '.in.new'), "w+")
 fmt = '{:2} {:12.9f} {:12.9f} {:12.9f}\n'
 with open(input_file, 'r+') as old_input:
 for line in old_input:
 if len(line.split()) != 4 and not line[0].isnumeric():
 new_input.write(line)
 if ('ATOMIC_POSITIONS') in line:
 for position in atomic_positions:
 new_input.write(fmt.format(position[0],*[float(xred) for xred in position[1:4]]))
#---------------------------------------------------------------------
def tolerance(force, override, file_name):
 print('A total force of {} was achieved in the last SCF cycle'.format(force))
 if (force < 0.001 and not override):
 print("Relaxation sufficient, total force = %s...terminating" %force)
 raise SystemExit
 if (force < 0.001 and override):
 print("Relaxation sufficient...total force = %s\n\nOverriding threshold"\
 " limit and generating %s" %(force, file_name.replace('.in', '.in.new')))
#---------------------------------------------------------------------
if __name__ == "__main__":
 arg_parse(argv)
asked Sep 23, 2020 at 17:52
\$\endgroup\$
3
  • 1
    \$\begingroup\$ It would be very helpful to provide example input and expected output. For example, the code reads through log_file four times. It looks like log_file can contain multiple lines with "ATOMIC_POSITIONS". If so, each occurrence overwrites the previous atomic_positions, which seems incorrect. \$\endgroup\$ Commented Sep 23, 2020 at 20:43
  • \$\begingroup\$ I'll be sure and edit the post for clarity. The log files contain 100 iterations of SCF calculations, each producing a set of new atomic positions, but I only want the last set of coordinates. In some cases it may only run 75, 50, or 20 cycles depending on how computationally intensive it is, but I only want the most recent set. \$\endgroup\$ Commented Sep 23, 2020 at 20:59
  • 1
    \$\begingroup\$ To reinforce a point already made, your parsing strategy (re-read the file for each piece of information you need) is unusual. I suspect there are much easier ways to do it, but it's not easy to give suggestions based on what we know. If you want better feedback, provide an example to show the file's data format. \$\endgroup\$ Commented Sep 24, 2020 at 5:30

1 Answer 1

2
\$\begingroup\$

One tip: use the argparse module.

import argparse
 
parser = argparse.ArgumentParser()
parser.add_argument("--log_file", dest="log_file", type=str, required=True, help="Add some help text here")
parser.add_argument("--input_file", dest="input_file", type=str, required=True, help="Add some help text here")
args = parser.parse_args()
# show the values, will reach this point only if the two parameters were provided
print(f"log_file: {args.log_file}")
print(f"input_file: {args.input_file}")

Then you call your script like this:

python3 generate.py --log_file test.log --input_file test.txt

Parameter order is free.

I have to admit I don't understand much about the purpose since I don't know about your input files. If they are CSV then you might consider using the csv module. Then you should be able to simplify some statements like this one:

total_force=[float(line.split()[3]) for line in log if line.rfind('Total force =') != -1][-1]

Something is lacking in your script, badly: comments. They will help you too, especially when go back to reviewing code you wrote a few months ago. Probably you will have forgotten details and will have to reanalyze your own code.

answered Sep 23, 2020 at 19:22
\$\endgroup\$
1
  • \$\begingroup\$ The argparse tip is super helpful, thank you! The logs and inputs are all just formatted plain text, we're using programs like Quantum Espresso and Abinit. \$\endgroup\$ Commented Sep 23, 2020 at 20:01

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.