Optimize bash script that concatenates output

Question 1

I am trying to optimize my script that loops through a folder and extracts only the part of a file name before a date and the header of the file and outputs it into a different file using a delimiter. I feel the script is robust and I want to refactor it. If there is also a better way, please tell me.

#!/bin/bash
# script variables
FOLDER=path/to/folder
LOG_FILE=path/to/logfile.csv
# Getting the pattern and header of files from FOLDER
for file in `ls $FOLDER/*.csv`
do
 echo -n $(basename "$file") 2>&1 | sed -r 's/(.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/1円/' | tee -a $LOG_FILE
 echo -n "|" | tee -a $LOG_FILE
 cat $file | head -1 | tee -a $LOG_FILE
done #> $LOG_FILE

Question 2

do ... done is a compound command; every subcommand shares the file descriptors; so teeing the the loop has the same effect as teeing each subcommand.
Two subsequent invocations of echo can be combined together.
cat $file is a dreaded UUOC.
A basename invocation can be avoided by changing directory to $FOLDER.
ls is absolutely unnecessary. The shell already globbed the *.csv.

Summing up,

 chdir "$FOLDER"
 for file in *.csv; do
 echo -n "$file" "|" | sed -r ...
 head -1 "$file"
 done | tee $LOGFILE

does the same job.

Question 3

Thanks vnp. It is very compact and does the same job. Just one modification chdir should be cd.

Question 4

Don't forget to leave in the -a argument to tail, assuming you want to append to the log instead of overwriting it.

Question 5

and I had to google UUOC (though the first hit was right). For others: it means "useless uses of cat"

Question 6

You can also lose the sed with a [[ $file =~ (.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9]\.[0-9][0-9]\.csv ]] && echo -n "${BASH_REMATCH[1]}"

Question 7

That sed pattern seems rather overly specific for a part of the filename you want to dump (unless there are other _#_#-#.#.csv filename endings that you do want to keep).

If you just want to dump from the second to last _ in the file then you can use

awk -F _ -v OFS=_ 'NF-=2' <<<"$file"

or

echo "$file" | awk -F _ -v OFS=_ 'NF-=2'

Alternatively I think one of these might do what is desired as well in a single command.

awk --re-interval '{len=split(FILENAME, f, "/"); sub(/_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv

Or if the regex is over-specified (as per my comment on the OP) then perhaps:

awk '{len=split(FILENAME, f, "/"); sub(/_[^_]+_[^_]+.csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2014-12-30 09:27:05Z

9

\$\begingroup\$

do ... done is a compound command; every subcommand shares the file descriptors; so teeing the the loop has the same effect as teeing each subcommand.
Two subsequent invocations of echo can be combined together.
cat $file is a dreaded UUOC.
A basename invocation can be avoided by changing directory to $FOLDER.
ls is absolutely unnecessary. The shell already globbed the *.csv.

Summing up,

 chdir "$FOLDER"
 for file in *.csv; do
 echo -n "$file" "|" | sed -r ...
 head -1 "$file"
 done | tee $LOGFILE

does the same job.

Share

answered Dec 30, 2014 at 9:27

vnp's user avatar

vnp vnp

58.7k4 gold badges55 silver badges144 bronze badges

\$\endgroup\$

4

2

\$\begingroup\$ Thanks vnp. It is very compact and does the same job. Just one modification chdir should be cd. \$\endgroup\$

manny
– manny

2014年12月30日 10:13:32 +00:00
Commented Dec 30, 2014 at 10:13
\$\begingroup\$ Don't forget to leave in the -a argument to tail, assuming you want to append to the log instead of overwriting it. \$\endgroup\$

Brian Minton
– Brian Minton

2014年12月30日 13:39:14 +00:00
Commented Dec 30, 2014 at 13:39
2

\$\begingroup\$ and I had to google UUOC (though the first hit was right). For others: it means "useless uses of cat" \$\endgroup\$

Brian Minton
– Brian Minton

2014年12月30日 13:40:50 +00:00
Commented Dec 30, 2014 at 13:40
1

\$\begingroup\$ You can also lose the sed with a [[ $file =~ (.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9]\.[0-9][0-9]\.csv ]] && echo -n "${BASH_REMATCH[1]}" \$\endgroup\$

iruvar
– iruvar

2014年12月30日 15:32:10 +00:00
Commented Dec 30, 2014 at 15:32

Add a comment |

Etan Reisner Etan Reisner 6813 silver badges11 bronze badges · Answer 2 · 2014-12-31 20:56:06Z

That sed pattern seems rather overly specific for a part of the filename you want to dump (unless there are other _#_#-#.#.csv filename endings that you do want to keep).

If you just want to dump from the second to last _ in the file then you can use

awk -F _ -v OFS=_ 'NF-=2' <<<"$file"

or

echo "$file" | awk -F _ -v OFS=_ 'NF-=2'

Alternatively I think one of these might do what is desired as well in a single command.

awk --re-interval '{len=split(FILENAME, f, "/"); sub(/_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv

Or if the regex is over-specified (as per my comment on the OP) then perhaps:

awk '{len=split(FILENAME, f, "/"); sub(/_[^_]+_[^_]+.csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv

Stack Exchange Network

Optimize bash script that concatenates output

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Optimize bash script that concatenates output

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions