Quite similar to my earlier review this script parses a frequency calculation and reformats the found values to either fit into one row for easier importing into a spreadsheet software or a formatted table.
The whole project including sample calculation files for testing can be found on github. I checked the script with http://www.shellcheck.net/# and there were no warnings left. I am looking for some input, whether I could simplify some of the code or make it more efficient.
#!/bin/bash
# Script can be used to parse one (or more) frequency calculation(s)
# of the quantum chemical software suite Gaussian09
# Intitialise scriptoptions
errCount=0
printlevel=0
isOptimisation=0
isFreqCalc=0
ignorePrintlevelSwitch=0
# Initialise all variables that are of interest
declare filename functional temperature pressure
declare electronicEnergy zeroPointEnergy thermalCorrEnergy thermalCorrEnthalpy thermalCorrGibbs
declare -a contributionNames thermalE heatCapacity entropy
declare routeSection
# Errors, Info and Warnings
fatal ()
{
echo "ERROR : " "$@"
exit 1
}
message ()
{
echo "INFO : " "$@"
}
warn ()
{
echo "WARNING: " "$@"
}
# Parse the commands that have been passed to Gaussian09
getRouteSec ()
{
# The route section is echoed in the log file, but it might spread over various lines
# options might be cut off in the middle. It always starts with # folowed by a space
# or the various verbosity levels NPT (case insensitive). The route section is
# terminated by a line of dahes. The script will stop reading the file if encountered.
local line appendline
local addline=0
while read -r line; do
if [[ $line =~ ^[[:space:]]*#[nNpPtT]?[[:space:]] || "$addline" == "1" ]]; then
[[ $line =~ ^[[:space:]]*[-]+[[:space:]]*$ ]] && break
appendline="$appendline$line"
addline=1
fi
done < "$filename"
routeSection="$appendline"
}
# Test the route section if it is an optimisation or frequency calculation (or both)
testRouteSec ()
{
local testRouteSection="1ドル"
if [[ $testRouteSection =~ ([oO][pP][tT][^[:space:]]*)([[:space:]]|$) ]]; then
warn "This appears to be an optimisation."
warn "Found '${BASH_REMATCH[1]}' in the route section."
warn "The script is not intended for creating a summary of an optimisation."
isOptimisation=1
fi
if [[ $testRouteSection =~ ([Ff][Rr][Ee][Qq][^[:space:]]*)([[:space:]]|$) ]]; then
message "Found '${BASH_REMATCH[1]}' in the route section."
isFreqCalc=1
fi
}
# Print the route section in human readable form
printRouteSec ()
{
message "Found route section:"
fold -w80 -c -s < <(echo "$routeSection")
echo "----"
}
getElecEnergy ()
{
# The last value is the only of concern. Since the script is intended for single
# point calculations, it is expected that the energy is printed early in the file.
# It will not fail if it is an optimisation (warning printed earlier).
local -r readWholeLine=$(grep -e 'SCF Done' "$filename" | tail -n 1)
# Gaussian output has following format, trap important information:
# Method, electronic Energy
# Example taken from BP86/cc-pVTZ for water (H2O):
# SCF Done: E(RB-P86) = -76.4006006969 A.U. after 10 cycles
local pattern="E\((.+)\) =[[:space:]]+([-]?[0-9]+\.[0-9]+)[[:space:]]+A\.U\..+ cycles"
if [[ $readWholeLine =~ $pattern ]]; then
functional="${BASH_REMATCH[1]}"
electronicEnergy="${BASH_REMATCH[2]}"
else
return 1
fi
}
# Get the desired information from the thermochemistry block
# parse the lines according to keywords and trap the values.
findTempPress ()
{
local readWholeLine="1ドル"
local pattern
pattern="^Temperature[[:space:]]+ ([0-9]+\.[0-9]+)[[:space:]]+Kelvin\.[[:space:]]+Pressure[[:space:]]+ ([0-9]+\.[0-9]+)[[:space:]]+Atm\.$"
if [[ $readWholeLine =~ $pattern ]]; then
temperature="${BASH_REMATCH[1]}"
pressure="${BASH_REMATCH[2]}"
else
return 1
fi
}
findZeroPointEnergy ()
{
local readWholeLine="1ドル"
local pattern
pattern="Zero-point correction=[[:space:]]+([-]?[0-9]+\.[0-9]+)"
if [[ $readWholeLine =~ $pattern ]]; then
zeroPointEnergy="${BASH_REMATCH[1]}"
else
return 1
fi
}
findThermalCorrEnergy ()
{
local readWholeLine="1ドル"
local pattern
pattern="Thermal correction to Energy=[[:space:]]+([-]?[0-9]+\.[0-9]+)"
if [[ $readWholeLine =~ $pattern ]]; then
thermalCorrEnergy="${BASH_REMATCH[1]}"
else
return 1
fi
}
findThermalCorrEnthalpy ()
{
local readWholeLine="1ドル"
local pattern
pattern="Thermal correction to Enthalpy=[[:space:]]+([-]?[0-9]+\.[0-9]+)"
if [[ $readWholeLine =~ $pattern ]]; then
thermalCorrEnthalpy="${BASH_REMATCH[1]}"
else
return 1
fi
}
findThermalCorrGibbs ()
{
local readWholeLine="1ドル"
local pattern
pattern="Thermal correction to Gibbs Free Energy=[[:space:]]+([-]?[0-9]+\.[0-9]+)"
if [[ $readWholeLine =~ $pattern ]]; then
thermalCorrGibbs="${BASH_REMATCH[1]}"
else
return 1
fi
}
# In the entropy block the given table needs to be transposed.
# Heat capacity and the break up of the internal energy are usually not that important,
# but they come as a freebie.
getEntropy ()
{
local index=0
while read -r line; do
if [[ $line =~ ^[[:space:]]*E[[:space:]]{1}\(Thermal\)[[:space:]]+CV[[:space:]]+S[[:space:]]*$ ]]; then
continue
fi
if [[ "$line" =~ ^[[:space:]]*KCal/Mol[[:space:]]+Cal/Mol-Kelvin[[:space:]]+Cal/Mol-Kelvin[[:space:]]*$ ]]; then
continue
fi
numpattern="[-]?[0-9]+\.[0-9]+"
pattern="^[[:space:]]*([a-zA-Z]+)[[:space:]]+($numpattern)[[:space:]]+($numpattern)[[:space:]]+($numpattern)[[:space:]]*+$"
if [[ $line =~ $pattern ]]; then
contributionNames[$index]=${BASH_REMATCH[1]:0:3}
thermalE[$index]=${BASH_REMATCH[2]}
heatCapacity[$index]=${BASH_REMATCH[3]}
entropy[$index]=${BASH_REMATCH[4]}
(( index++ ))
fi
done < <(grep -A6 -e 'E (Thermal)[[:space:]]\+CV[[:space:]]\+S' "$filename")
}
# If requested print the transposed table of entropies, heat capacities and the break up of the int. energy.
printEntropy ()
{
local index
printf "%-15s : " "Contrib."
for (( index=0; index < ${#contributionNames[@]}; index++ )); do
printf "%-10s " "${contributionNames[$index]}"
done
printf "%-15s\n" "Unit"
printf "%-11s %4s: " "thermal en." "(U)"
for (( index=0; index < ${#thermalE[@]}; index++ )); do
printf "%+10.3f " "${thermalE[$index]}"
done
printf "%-15s\n" "kcal/mol"
printf "%-11s %4s: " "heat cap." "(Cv)"
for (( index=0; index < ${#heatCapacity[@]}; index++ )); do
printf "%+10.3f " "${heatCapacity[$index]}"
done
printf "%-15s\n" "cal/(mol K)"
printf "%-11s %4s: " "entropy" "(S)"
for (( index=0; index < ${#entropy[@]}; index++ )); do
printf "%+10.3f " "${entropy[$index]}"
done
printf "%-15s\n" "cal/(mol K)"
}
# Grep should be faster than parsing every line of the outputfile with =~
getThermochemistryLines ()
{
grep -e 'Temperature.*Pressure' -e 'Zero-point correction' \
-e 'Thermal correction to Energy' -e 'Thermal correction to Enthalpy' \
-e 'Thermal correction to Gibbs Free Energy' \
"$filename"
}
# Parse the thermochemistry output
getThermochemistry ()
{
while read -r line; do
findTempPress "$line" && continue
findZeroPointEnergy "$line" && continue
findThermalCorrEnergy "$line" && continue
findThermalCorrEnthalpy "$line" && continue
findThermalCorrGibbs "$line" && continue
done < <(getThermochemistryLines "$filename")
}
getAllEnergies ()
{
getElecEnergy || fatal "Unable to find electronic energy."
(( isFreqCalc == 1 )) && getThermochemistry
(( isFreqCalc == 1 )) && getEntropy
}
# If only one line of output is requested for easier importing
printAllEnergiesInline ()
{
local fs="1ドル"
local index
local header=("Method" "T (K)" "P (atm)" "E(SCF) (au/p)" \
"E(ZPE) (au/p)" "U(corr) (au/p)" "H(corr) (au/p)" "G(corr) (au/p)" \
"S (cal/[mol K])" "Cv (cal/[mol K])")
local values=("$functional" "$temperature" "$pressure" "$electronicEnergy" \
"$zeroPointEnergy" "$thermalCorrEnergy" "$thermalCorrEnthalpy" "$thermalCorrGibbs"\
"${entropy[0]}" "${heatCapacity[0]}")
local printHeader printValues
for (( index=0; index < ${#header[@]}; index++ )); do
if [ ${#values[$index]} -lt ${#header[$index]} ]; then
printf -v printHeader "%s%-*s%s" "$printHeader" ${#header[$index]} "${header[$index]}" "$fs"
printf -v printValues "%s%*s%s" "$printValues" ${#header[$index]} "${values[$index]}" "$fs"
else
printf -v printHeader "%s%-*s%s" "$printHeader" ${#values[$index]} "${header[$index]}" "$fs"
printf -v printValues "%s%*s%s" "$printValues" ${#values[$index]} "${values[$index]}" "$fs"
fi
done
message "File: $filename"
echo "$printHeader"
echo "$printValues"
}
# Print a table (e.g. for archiving)
printAllEnergiesTable ()
{
printf "%-25s %8s: %-20s %-20s\n" "calculation details" "" "$functional" "$filename"
printf "%-25s %8s: %20.3f %-20s\n" "temperature" "(T)" "$temperature" "K"
printf "%-25s %8s: %20.5f %-20s\n" "pressure" "(p)" "$pressure" "atm"
printf "%-25s %8s: %+20.10f %-20s\n" "electr. en." "(E)" "$electronicEnergy" "hartree"
printf "%-25s %8s: %+20.6f %-20s\n" "zero-point corr." "(ZPE)" "$zeroPointEnergy" "hartree/particle"
printf "%-25s %8s: %+20.6f %-20s\n" "thermal corr." "(U)" "$thermalCorrEnergy" "hartree/particle"
printf "%-25s %8s: %+20.6f %-20s\n" "ther. corr. enthalpy" "(H)" "$thermalCorrEnthalpy" "hartree/particle"
printf "%-25s %8s: %+20.6f %-20s\n" "ther. corr. Gibbs en." "(G)" "$thermalCorrGibbs" "hartree/particle"
printf "%-25s %8s: %+20.3f %-20s\n" "entropy (total)" "(S tot)" "${entropy[0]}" "cal/(mol K)"
printf "%-25s %8s: %+20.3f %-20s\n" "heat capacity (total)" "(Cv t)" "${heatCapacity[0]}" "cal/(mol K)"
}
printSummary ()
{
case $printlevel in
0) printAllEnergiesInline " ";;
c) printAllEnergiesInline ", ";;
1) printAllEnergiesTable ;;
2) printRouteSec
printAllEnergiesTable ;;
3) printRouteSec
printAllEnergiesTable
printf "%s\nDetails of the composition\n" "----"
printEntropy ;;
*) fatal "Unrecognised printlevel: $printlevel"
esac
# All variable should be unset in case another file is processed
unset filename functional temperature pressure
unset electronicEnergy zeroPointEnergy thermalCorrEnergy thermalCorrEnthalpy thermalCorrGibbs
unset contributionNames thermalE heatCapacity entropy
unset routeSection
}
analyseLog ()
{
getRouteSec
testRouteSec "$routeSection"
getAllEnergies
if [[ $isFreqCalc == "1" ]]; then
printSummary
unset isFreqCalc isOptimisation
else
(( isOptimisation == 1 )) && local append=" (last value)"
message "$filename"
message "Electronic energy$append: $electronicEnergy hartree"
unset isOptimisation electronicEnergy routeSection
fi
}
# Start main script
# Evaluate options
while getopts :vcV:hu options ; do
case $options in
v) [[ $ignorePrintlevelSwitch == 1 ]] || ((printlevel++)) ;;
c) printlevel="c" ;;
V) if [[ $OPTARG =~ ^[0-9]{1}$ ]]; then
printlevel="$OPTARG"
ignorePrintlevelSwitch=1
else
fatal "Invalid argument: $OPTARG"
fi ;;
# The following will be substituted by a proper help
h) message "Usage: 0ドル [options] filenames(s)"; exit 0 ;;
# The following will be substituted by a proper usage guidance
u) message "Usage: 0ドル [options] filenames(s)"; exit 0 ;;
\?) warn "Invalid option: -$OPTARG." ;;
:) fatal "Option -$OPTARG requires an argument." ;;
esac
done
shift $((OPTIND-1))
# Check if filename is specified
if [[ $# == 0 ]]; then
fatal "No output file specified. Nothing to do."
fi
# Assume all other commandline arguments are filenames
while [ ! -z "1ドル" ]; do
filename="1ドル"
if [ ! -e "$filename" ]; then
warn "Specified logfile '$filename' does not exist. Continue."
((errCount++))
else
analyseLog "$filename"
fi
shift
[[ ! -z "1ドル" && $printlevel -gt 0 ]] && echo "===="
done
# Issue an error if files have not been found.
if [ "$errCount" -gt 0 ]; then
fatal "There have been one or more errors reading the specified files."
fi
1 Answer 1
Use a for-each loop when the index is not needed
In this loop you don't really need the loop index:
for (( index=0; index < ${#contributionNames[@]}; index++ )); do printf "%-10s " "${contributionNames[$index]}" done
Since you need just the elements, a for-each loop is more natural:
for name in "${contributionNames[@]}"; do
printf "%-10s " "$name"
done
Do the same for the other counting loops too where possible.
Use here-string instead of redirecting echo
Instead of this:
fold -w80 -c -s < <(echo "$routeSection")
Better to use a here-string to avoid an additional process creation:
fold -w80 -c -s <<< "$routeSection"
Use modern (( ... ))
consistently
Since you used (( ... ))
a lot, I don't understand why you didn't use it here:
if [ ${#values[$index]} -lt ${#header[$index]} ]; then
-lt
is old-fashioned, you could use naturally <
in a (( ... ))
.
A consistent writing style improves readability.
Simple scripts evolving into complex programs
This script is getting really complex. Bash is great for simple things, but when things start to get complex, it often becomes a wrestle. I would recommend Python as a scripting language for complex tasks. It has great unit testing facilities too.
-
\$\begingroup\$ Thank you as always for your comments. Bash is currently the only thing I know well enough to get me started, but I have considered learning a bit more. Since you are recommending python, do you also have a recommendation for a starter guide or tutorial? \$\endgroup\$Martin - マーチン– Martin - マーチン2016年06月13日 10:14:26 +00:00Commented Jun 13, 2016 at 10:14
-
1\$\begingroup\$ Absolutely, I recommend the official tutorial, it's great, and a good way to start out on the right foot, following good pythonic practices docs.python.org/3/tutorial \$\endgroup\$janos– janos2016年06月13日 13:55:31 +00:00Commented Jun 13, 2016 at 13:55
awk
well. \$\endgroup\$