I am trying to optimize my script that loops through a folder and extracts only the part of a file name before a date and the header of the file and outputs it into a different file using a delimiter. I feel the script is robust and I want to refactor it. If there is also a better way, please tell me.
#!/bin/bash
# script variables
FOLDER=path/to/folder
LOG_FILE=path/to/logfile.csv
# Getting the pattern and header of files from FOLDER
for file in `ls $FOLDER/*.csv`
do
echo -n $(basename "$file") 2>&1 | sed -r 's/(.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/1円/' | tee -a $LOG_FILE
echo -n "|" | tee -a $LOG_FILE
cat $file | head -1 | tee -a $LOG_FILE
done #> $LOG_FILE
2 Answers 2
do ... done
is a compound command; every subcommand shares the file descriptors; sotee
ing the the loop has the same effect astee
ing each subcommand.Two subsequent invocations of
echo
can be combined together.cat $file
is a dreaded UUOC.A
basename
invocation can be avoided by changing directory to$FOLDER
.ls
is absolutely unnecessary. The shell already globbed the*.csv
.
Summing up,
chdir "$FOLDER"
for file in *.csv; do
echo -n "$file" "|" | sed -r ...
head -1 "$file"
done | tee $LOGFILE
does the same job.
-
2\$\begingroup\$ Thanks vnp. It is very compact and does the same job. Just one modification chdir should be cd. \$\endgroup\$manny– manny2014年12月30日 10:13:32 +00:00Commented Dec 30, 2014 at 10:13
-
\$\begingroup\$ Don't forget to leave in the
-a
argument totail
, assuming you want to append to the log instead of overwriting it. \$\endgroup\$Brian Minton– Brian Minton2014年12月30日 13:39:14 +00:00Commented Dec 30, 2014 at 13:39 -
2\$\begingroup\$ and I had to google UUOC (though the first hit was right). For others: it means "useless uses of cat" \$\endgroup\$Brian Minton– Brian Minton2014年12月30日 13:40:50 +00:00Commented Dec 30, 2014 at 13:40
-
1\$\begingroup\$ You can also lose the
sed
with a[[ $file =~ (.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9]\.[0-9][0-9]\.csv ]] && echo -n "${BASH_REMATCH[1]}"
\$\endgroup\$iruvar– iruvar2014年12月30日 15:32:10 +00:00Commented Dec 30, 2014 at 15:32
That sed pattern seems rather overly specific for a part of the filename you want to dump (unless there are other _#_#-#.#.csv
filename endings that you do want to keep).
If you just want to dump from the second to last _
in the file then you can use
awk -F _ -v OFS=_ 'NF-=2' <<<"$file"
or
echo "$file" | awk -F _ -v OFS=_ 'NF-=2'
Alternatively I think one of these might do what is desired as well in a single command.
awk --re-interval '{len=split(FILENAME, f, "/"); sub(/_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv
Or if the regex is over-specified (as per my comment on the OP) then perhaps:
awk '{len=split(FILENAME, f, "/"); sub(/_[^_]+_[^_]+.csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv