i am testing an application that i wrote and want to test the solution my algorithm produces to a Monte carlo solution. I use the harddisk a lot and i was wondering if there was a solution that uses writing data to a file a lot less, since it is really slowing the process down.
The solutions are computed on the nodes of a cluster and examined using this script ( that runs on a node): Parameter 1ドル is an outputfile that the program wrote.
file=1ドル
script=/home/hefke/ov_paper/scripts
mv $file.out $file.out.old
grep "Overlapscore:" $file.monte > $file.grepped
awk '/./{print 2ドル}' $file.grepped > $file.overlap
print "$script/std_dev.sh $file.overlap > $file.out"
$script/std_dev.sh $file.overlap > $file.out
cat $file.analy >> $file.out
cat "DONE" >> $file.out
Here is the script that collects the data on the main node. Analy and Monte files are my output files.
echo "Processing outputfiles for the mc_stdev_of_ov"
script=/home/hefke/ov_paper/scripts
curdir=`pwd`
folder=filedata
for file in `ls -1 $curdir/temp_output/$folder/*.analy| sed 's/\(.*\)\..*/1円/'|uniq`
do
echo $file
$script/submitter.sh $curdir "processonefile.sh $file.out"
done
echo "$file.out now contains what stdtev spat out."
cat $curdir/temp_output/$folder/*.out >> $curdir/temp_output/tmp.out
awk -f keys.awk $curdir/temp_output/tmp.out >> table.out
cat table.out
How can i optimize this procedure for speed?
2 Answers 2
You don't need to store in files between each command. Instead, just redirect the output:
$script/std_dev.sh < <(grep "Overlapscore:" $file.monte | awk '/./{print 2ドル}') > $file.out
The Bash Guide has an excellent article about I/O.
There's only one place where you write to tmp.out, and awk
can take more than one file, so you can simplify those lines similarly:
awk -f keys.awk $curdir/temp_output/$folder/*.out
There's no need to redirect to table.out and cat
ing it afterwards.
You shouldn't use ls
in scripts; you can simply loop over a glob:
for file in $curdir/temp_output/$folder/*.analy
file="${file%.*}" # Remove extension
-
\$\begingroup\$ when i use script/std_dev.sh < <(grep "Overlapscore:" $file.monte | awk '/./{print 2ドル}') > $file.out, it tells me :Missing name for redirect. \$\endgroup\$tarrasch– tarrasch2012年03月14日 13:28:35 +00:00Commented Mar 14, 2012 at 13:28
-
-
\$\begingroup\$ l0b0 you sir are a genius. as a matter of fact i am not :(. I am running the cshell. Thank you very much for your answer anyways :) \$\endgroup\$tarrasch– tarrasch2012年03月15日 06:44:14 +00:00Commented Mar 15, 2012 at 6:44
-
\$\begingroup\$ not relating to the question any more, but is there a way to group commands with the () as in bash in cshell? \$\endgroup\$tarrasch– tarrasch2012年03月15日 08:03:23 +00:00Commented Mar 15, 2012 at 8:03
-
It's not related, but please don't mind if I use an "answer" to just comment : it seems I can't comment, maybe because I don't have enough points yet to do so...
Tarrasch, if you still use csh for your shell, please do not script in it.
Please read: http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/
Use instead sh, bash (or even ksh). And better to stick to sh-only because that's what's all unix system rely on (and rc scripts, for example, are based on).