I have files in a store
folder, which I want to trim to a particular length by removing bytes from the beginning. My app uses a temp
folder for the files manipulation. All files from the previous operations must be taken out form temp
, so I use the folder for temporary storing the trimmed files. I also log the errors.
#!/bin/bash
set -euo pipefail
data_service='/path/to/data-service'
store="${data_service}/data/store"
temp="${data_service}/data/temp"
log="${data_service}/logs/log.txt"
max_size=$(( 200000 * 24 ))
{
# Checks if there are any files in 'temp'. It should be empty.
if [ "$( ls -A "${temp}" )" ]
then
echo "$( date +"%Y-%m-%d %H:%M:%S" ) [ERROR] Temp folder is not empty!" 1>&2
exit 1
fi
# Loops over all the files in 'store'
for file_path in "${store}/"*
do
# Trim bigger then 'max_size' files from 'store' to 'temp'
if [ "$( wc -c < "${file_path}" )" -gt "${max_size}" ]
then
file_name="$( basename "${file_path}" )"
tail -c "${max_size}" "${file_path}" > "${temp}/${file_name}"
fi
done
# Move all the truncated files back to 'store'
if [ "$( ls -A "${temp}" )" ]
then
mv "${temp}/"* "${store}/"
fi
} 2>> "${log}"
Any potential problems or ways to improve the code?
Because these are my first steps in writing in bash, the specific things I googled for this code were:
Multiplication and assigning the result to a variable:
max_size=$(( 200000 * 24 ))
Check if there are files in a folder:
if [ "$( ls -A "$temp" )" ]; then .... fi
Get the date and time in a formatted string
$( date +"%Y-%m-%d %H:%M:%S" )
Get the byte size of a file:
"$( wc -c < "$file_path" )"
Copy the last N bytes of a file
tail -c "$max_size" "$file_path" > "$temp/$file_name"
Group commands and redirect STDERR to a log file
{ ... } 2>> log-file
Echo to STDERR of the group
echo 'Error message' 1>&2
1 Answer 1
Some suggestions
Get the byte size of a file
Instead of:
"$( wc -c < "$file_path" )"
use stat:
stat -c %s "$file_path"
=> better query the file system than count the number of bytes in files, some of which can be pretty large
Redirect or copy console output to a file
Instead of enclosing the whole block between brackets, I would set up file redirection from the beginning eg:
#!/bin/bash
LOG_FILE="/tmp/test.log"
exec > >(tee -a "${LOG_FILE}") 2>&1
# this should show on console AND in the log file
echo "Hello world $(date)"
based on this example
This is more flexible, because once it's set up you don't have to bother about appending your commands, and you can disable or redefine the behavior on the fly if desired.
And the benefit is that since we are redirecting both stdout and stderr, any errors will show up in the log file as well.
Misc
I think parsing the output of ls to count files in the directory is a bit primitive. Let me think but hopefully someone will come up with a better way.
-
\$\begingroup\$ Good idea to use
stat
. I made a very naive benchmark of 'wc' vs 'stat' and found about virtually the same performance. It is possible 'wc --bytes' to use the filesystem stats internally instead fo counting the bytes.time for file_name in * ; do file_size="$( stat --format=%s "${file_name}" )"; done
\$\endgroup\$Miroslav Popov– Miroslav Popov2020年09月14日 06:42:35 +00:00Commented Sep 14, 2020 at 6:42