I haven't used Unix, etc. for 40 years, so I am a bit rusty. I need to back up the databases on my website(s) once a week and then retrieve them automatically from my development machine. So this is step one and is working fine as a crontab job on Apache. It emails me a message with status/error.
- I am creating a folder with the date as name and keeping 14 backups.
- I am not using a loop for the databases, because each one has a different user/pwd combination.
- I don't see it dangerous to have the username and passwords here. If someone has root access, they can access the database anyway.
- The file size is returned as well.
- The start time and finish time are returned, so I can calculate how long the job took.
How can make it more generic, or faster, or safer? Am I catching all the errors?
#!/usr/bin/bash
TIMESTAMP=$(date +"%F")
BACKUP_DIR=backup
MYSQLDUMP=/usr/bin/mysqldump
STAT=/usr/bin/stat
EMAIL=***
KEEP="+14"
MSG=tmp/backup_msg
echo "Date: $(date)" > $MSG
echo "Hostname: $(hostname)" >> $MSG
echo " " >> $MSG
echo " " >> $MSG
mkdir -p "$BACKUP_DIR/$TIMESTAMP" 2>> $MSG
DB=xxx_directory
BKP=$BACKUP_DIR/$TIMESTAMP/db_biz_directory.sql
echo "Backing up '$DB' to '$BKP'" >> $MSG
$MYSQLDUMP -u username -ppassword $DB > $BKP 2>> $MSG
SIZE=$($STAT -c%s $BKP) 2>> $MSG
echo "Filesize: $SIZE" >> $MSG
echo " " >> $MSG
DB=xxx_software
BKP=$BACKUP_DIR/$TIMESTAMP/db_swr_software.sql
echo "Backing up '$DB' to '$BKP'" >> $MSG
$MYSQLDUMP -u username -ppassword $DB > $BKP 2>> $MSG
SIZE=$($STAT -c%s $BKP) 2>> $MSG
echo "Filesize: $SIZE" >> $MSG
echo " " >> $MSG
echo "Removing excess backups" >> $MSG
find $BACKUP_DIR* -mtime $KEEP -exec rm {} \; >> $MSG 2>> $MSG
echo "Finished: $(date)" >> $MSG
mail -s "MySQL Backup script has run" "$EMAIL" <$MSG
rm -f $MSG
5 Answers 5
Overall
This is totally decent code and not too difficult to maintain. Your variable names make sense and you've made some effort at not repeating yourself, other than the lack of a loop which you own up to in your question. There are definitely some ways to make this better, but you'll have to decide how much it is worth it.
High-level suggestions
- Two major things stick out here - the use of plain text passwords and the repeated code that that leads to. Let's tackle the passwords first. MySQL provides a more secure way to store your passwords. If your MySQL isn't that new, you can always go back to the ~/.my.cnf style of configuration and create a config file specific to each database. Either way, your username and password are stored in a place that will work for running the
mysql
client interactively as well as formysqldump
. - Once you've got it so your passwords are not causing your code to need to be different for handling each database you can build the loop you mentioned. This will be handy later if you ever add more databases. I'd recommend making this into a shell function so your code can more self-explanatory and reusable.
- Another function that would be wise to build is one for appending to your
$MSG
file. - You asked about other error conditions that you can check. The biggest one that comes to mind is filling up the disk. It is a good idea to check to make sure you haven't filled up the drive and ended up with truncated backups.
- Another thing to check is that your
mysqldump
is giving you a reasonably sized file. If it is less than 1k you probably have a problem to dig into. You are outputting this information, but it would be good to highlight if there is an issue without expecting the person reading the email to pay attention to all of these details. Humans get tired and aren't great at being so vigilant for many years. And you're already half-way there since you're usingstat
to get the file size. - If you're really keen on making this fault-tolerant, then there's a chance that you're not able to make the directory. There's no point in continuing if that fails. You can do something like
mkdir ... || echo "mkdir failed" >> "$MSG" && exit 1
.
Best practices suggestions.
- It is a good practice to quote your variable substitutions.
echo " "
looks a bit weird to me: you can get rid of the space in the quotes and still get a blank in our$MSG
file.- If you ever wanted to run your script in parallel you would have conflicts over the
$MSG
file. Usingmktemp
would give you unique names for each process. - Check out shellcheck. It can also be run locally.
-
\$\begingroup\$ I did look at the page for mysql_config_editor. It doesn't seem to be installed on my server. I checked in /bin and /usr/bin. The term login_path confused me and I couldn't figure out how to have different passwords for the databases. I assume that I can use it to mean profile? Also, I was not clear if this applied to just the command line or all accesses to the databases - including from php. \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月15日 02:47:54 +00:00Commented Aug 15, 2022 at 2:47
-
\$\begingroup\$ If you create files in
my.cnf
format you can refer to that config file on themysqldump
command line by using--defaults-extra-file=
and naming the files similar to your database names. \$\endgroup\$chicks– chicks2022年08月15日 02:57:39 +00:00Commented Aug 15, 2022 at 2:57
While you can get away with using this script to dump small databases on a low-traffic site, you should be aware that this is not a good backup technique for safeguarding data in a production system. The main issue is that mysqldump
, by default, does not work atomically. Data that are written to MySQL while mysqldump
is running may or may not be captured in the dump, so the output may contain data that don't represent a logically coherent state of the application. To capture the state of the database at an instant in time, you need to do one of the following:
mysqldump --single-transaction
, if you are only using transactional tables such as InnoDB.mysqldump --lock-tables
if your database contains some non-transactional tables. Locking the database could cause disruption to your applications.mylvmbackup
, which is a Perl script that performs an atomic backup using the following strategy, such that there is no noticeable downtime:- Lock the database and flush all outstanding write operations.
- Create an LVM snapshot of the filesystem on which the database resides. This is a quick operation.
- Unlock the database.
- Launch a read-only MySQL server on the snapshot.
- Run
mysqldump
against the read-only MySQL server. - Delete the snapshot.
If you're doing this backup as part of your job as a system administrator, I'd recommend using a tool like mylvmbackup
instead of building your own. You're usually better off not developing and maintaining your own code, when you have the option of using something that is already widely used, configurable, and documented.
-
\$\begingroup\$ It is a very low traffic site. The only write activity is the update of file download counts. However, I will have a look. \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月15日 13:33:41 +00:00Commented Aug 15, 2022 at 13:33
There are a coupe of bad practices here. First, as a general rule, you should avoid using CAPS for variable names in shell scripts. By convention, global environment variables are capitalized so making your own, local variables upper case can lead to naming collisions and very hard to debug bugs. It is much better to use lower case variable names for internal variables.
Next, you should always quote your variables. In this case, since you control the values, it is less of a danger than it could be, but you still should quote them. See this excellent Q&A over at Unix & Linux: Security implications of forgetting to quote a variable in bash/POSIX shells.
In your specific case, this find
command is fragile:
find $BACKUP_DIR* -mtime $KEEP -exec rm {} \; >> $MSG 2>> $MSG
You haven't told us what this command is supposed to do, so I a guessing that you want to search through all directories who name starts with the value of $BACKUP_DIR
. If that's really what you mean, you might want to change the name of the variable to backup_dir_prefx
or something to make that clear.
If instead you just want to search in the directory $BACKUP_DIR
, then remove the *
. The shell will expand the wildcard (*
) before calling find
which means that the unquoted $BACKUP_DIR*
will match backupNoThisDirectoryEver
or anything else starting with the value of $BACKUP_DIR
. Plus, this will fail if you ever change the variable to a value containing a space or wildcard characters.
Next, if you are running this on Linux or more generally using GNU find
, you can use the -delete
option. If not, you can at least end the command with +
instead of \;
to delete multiple files with the same command for efficiency. From man find
:
-exec command {} + This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total number of invoca‐ tions of the command will be much less than the number of matched files. [...]
Finally, since you're using bash
, you can use &>
to redirect both stderr and stdout. Putting all of this together gives you:
find "$backup_dir"* -mtime "$keep" -delete &>> "$MSG"
Or, if you don't want to look in other directories:
find "$backup_dir" -mtime "$keep" -delete &>> "$MSG"
And if this is not using GNU find
:
find "$backup_dir" -mtime "$keep" -exec rm -r {} + &>> "$MSG"
Also note that this command will also return "$backup_dir"
as a result:
$ find backup/
backup/
backup/file.txt
If the backup
directory itself matches the -mtime
you set, then your find will also delete the entire directory. You could avoid that with -mindepth 1
if you are using GNU find
.
-
\$\begingroup\$ I will take the advice. I hadnt tested the find exhaustively, because I intend to delete the backups as they are transferred to my machine. \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月15日 13:31:48 +00:00Commented Aug 15, 2022 at 13:31
-
1\$\begingroup\$ @RohitGupta what is the objective of using
find $BACKUP_DIR*
instead offind $BACKUP_DIR
? The Right Way® will depend on what you want that to do. \$\endgroup\$terdon– terdon2022年08月15日 13:39:33 +00:00Commented Aug 15, 2022 at 13:39 -
\$\begingroup\$ The script has morphed since I started and the find should have been
find $BACKUP_DIR/*
. The original intention was to delete files and directories older than 14 days. At that time, I was going to backup every day. Now I am doing it once a week, and I still want to keep 14 sets of data. So I need to revisit this. \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月16日 00:01:40 +00:00Commented Aug 16, 2022 at 0:01 -
1\$\begingroup\$ @RohitGupta you did not use
$BACKUP_DIR/*
, that is another, different thing. You used$BACKUP_DIR*
which means "look inside any directories whose name starts with the value of the variable$BACKUP_DIR
. If that isn't what you want, if you don't have something likebakcup1
,backup2
etc, but instead want to search through all directories inside$BACKUP_DIR
, then you wantfind $BACKUP_DIR -name ...
no need for the*
and the*
will make you look inside other directories you don't care about. That's why I am asking why you have the*
to understand what you want to do. \$\endgroup\$terdon– terdon2022年08月16日 08:10:21 +00:00Commented Aug 16, 2022 at 8:10 -
1\$\begingroup\$ @RohitGupta
find path/to/dir
will find everything in all subdirectories ofpath/to/dir
, you don't needpath/to/dir/*
andpath/to/dir*
(without the final/
) means "look insidepath/to/dir
and alsopath/to/directMessages
and anything else that starts withpath/to/dir
. So just don't use the*
and you should be OK. For more questions and help on using thefind
command, I suggest you ask over at Unix & Linux. \$\endgroup\$terdon– terdon2022年08月16日 13:27:38 +00:00Commented Aug 16, 2022 at 13:27
This is not a code style suggestion, but a specific tip for mysqldump.
According to this documentation, if the --comments
option is enabled (hint: it is enabled by default), mysqldump produces a SQL comment at the end of the dump in the following form:
-- Dump completed on DATE
Given this, a small improvement could be to tail the last line of the dump file and parse/check the output for this comment.
A very simple check would be to "grep count" for the expected content, with the expected value being 1:
$ tail -n1 "$BKP" | grep -c "Dump completed"
1
From here it should be easy to operationalise the check by warning/handling the unexpected condition (that is, the "grep count" value is 0).
-
2\$\begingroup\$ Nice idea! When it's enough to know whether a text input contains something or not, you don't need the count or any output from grep, you can use the exit code to decide success or failure. For example:
... | grep -q "Dump completed on" || fatal "Dump failed, see logs"
\$\endgroup\$janos– janos2022年08月16日 06:19:23 +00:00Commented Aug 16, 2022 at 6:19 -
\$\begingroup\$ The line is there, so its a possible enhancement. \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月16日 12:44:51 +00:00Commented Aug 16, 2022 at 12:44
-
\$\begingroup\$ That's a nice tip @janos! I wasn't aware of that behaviour with
grep -q
—will come in handy. \$\endgroup\$j13k– j13k2022年08月17日 07:13:04 +00:00Commented Aug 17, 2022 at 7:13
(In addition to what other reviewers already pointed out.)
Redirecting everything to $MSG
The script is full of redirections to $MSG
,
which is noisy when reading,
and was probably painful when writing.
Assuming you wouldn't mind redirecting everything,
you could wrap all the code the produces output and error message either by grouping in { ... }
or using a function, for example:
main() {
echo "Date: $(date)"
echo "Hostname: $(hostname)"
# ...
mkdir -p "$BACKUP_DIR/$TIMESTAMP"
# ...
"$MYSQLDUMP" -u username -ppassword "$DB" > "$BKP"
# ...
}
main &> "$MSG"
Making the script safer
You can make Bash scripts safer in general by setting some flags that help you catch common mistakes early, notably:
set -euo pipefail
This does a bunch of helpful things:
-e
makes the script abort on the first failing command, which can help in many ways- avoid disasters by failing fast
- highlight that something unexpected happened
- make the failure easy to see, at the end of the script output
- encourages implementing solid error handling
-u
makes it an error when referencing an undefined variable; together with-e
, this again can help avoid disasters-o pipefail
makes the exit code of a pipeline success only when all commands exit with success (you can read more about it inman bash
, search for "pipefail")
-
\$\begingroup\$ I hadn't known that functions existed in shellscript. I will definitely do this. I have six websites, which is why I was looking to make it more generic. I am going to leave a slightly altered version of this one in place. It's working. And I will refine it for the next one. Is it possible to place the function in an include file ? \$\endgroup\$Rohit Gupta– Rohit Gupta2022年08月16日 22:20:55 +00:00Commented Aug 16, 2022 at 22:20
-
1\$\begingroup\$ You can include other bash scripts using
. path/to/script.sh
(the.
builtin) \$\endgroup\$janos– janos2022年08月17日 05:07:52 +00:00Commented Aug 17, 2022 at 5:07
GETSIZE="/usr/bin/stat -c%s"
, thenSIZE=$($GETSIZE $BKP 2>>$MSG)
. (The2>>
won't work the way you want if you put it outside the$()
). \$\endgroup\$