6
\$\begingroup\$

I am trying to optimize my script that loops through a folder and extracts only the part of a file name before a date and the header of the file and outputs it into a different file using a delimiter. I feel the script is robust and I want to refactor it. If there is also a better way, please tell me.

#!/bin/bash
# script variables
FOLDER=path/to/folder
LOG_FILE=path/to/logfile.csv
# Getting the pattern and header of files from FOLDER
for file in `ls $FOLDER/*.csv`
do
 echo -n $(basename "$file") 2>&1 | sed -r 's/(.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv/1円/' | tee -a $LOG_FILE
 echo -n "|" | tee -a $LOG_FILE
 cat $file | head -1 | tee -a $LOG_FILE
done #> $LOG_FILE
asked Dec 30, 2014 at 8:23
\$\endgroup\$

2 Answers 2

9
\$\begingroup\$
  • do ... done is a compound command; every subcommand shares the file descriptors; so teeing the the loop has the same effect as teeing each subcommand.

  • Two subsequent invocations of echo can be combined together.

  • cat $file is a dreaded UUOC.

  • A basename invocation can be avoided by changing directory to $FOLDER.

  • ls is absolutely unnecessary. The shell already globbed the *.csv.

Summing up,

 chdir "$FOLDER"
 for file in *.csv; do
 echo -n "$file" "|" | sed -r ...
 head -1 "$file"
 done | tee $LOGFILE

does the same job.

answered Dec 30, 2014 at 9:27
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Thanks vnp. It is very compact and does the same job. Just one modification chdir should be cd. \$\endgroup\$ Commented Dec 30, 2014 at 10:13
  • \$\begingroup\$ Don't forget to leave in the -a argument to tail, assuming you want to append to the log instead of overwriting it. \$\endgroup\$ Commented Dec 30, 2014 at 13:39
  • 2
    \$\begingroup\$ and I had to google UUOC (though the first hit was right). For others: it means "useless uses of cat" \$\endgroup\$ Commented Dec 30, 2014 at 13:40
  • 1
    \$\begingroup\$ You can also lose the sed with a [[ $file =~ (.*)_[0-9]{8}_[0-9][0-9]-[0-9][0-9]\.[0-9][0-9]\.csv ]] && echo -n "${BASH_REMATCH[1]}" \$\endgroup\$ Commented Dec 30, 2014 at 15:32
2
\$\begingroup\$

That sed pattern seems rather overly specific for a part of the filename you want to dump (unless there are other _#_#-#.#.csv filename endings that you do want to keep).

If you just want to dump from the second to last _ in the file then you can use

awk -F _ -v OFS=_ 'NF-=2' <<<"$file"

or

echo "$file" | awk -F _ -v OFS=_ 'NF-=2'

Alternatively I think one of these might do what is desired as well in a single command.

awk --re-interval '{len=split(FILENAME, f, "/"); sub(/_[0-9]{8}_[0-9][0-9]-[0-9][0-9].[0-9][0-9].csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv

Or if the regex is over-specified (as per my comment on the OP) then perhaps:

awk '{len=split(FILENAME, f, "/"); sub(/_[^_]+_[^_]+.csv$/, "", f[len]); printf f[len] "|"; print; nextfile}' /path/to/folder/*.csv
answered Dec 31, 2014 at 20:56
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.