I have a usual Django application with a single MySQL database. The purpose of this shell script is to backup a MySQL database every midnight by crontab and check that everything goes right.
The server has mysql
and mysqldump
executables. The connection to the database is stored with the help of django-dotenv, which is a config file with the structure:
<key>='<value>'
The script has three template variables:
<project_root>
- this is the root directory of the Django project<project_name>
- this is the path from the root directory to the.env
file or project name (they match for the Django project structure)<dump_root>
- path to store dumps
Note: Not all of them end with a slash.
For now, I'm doing these things:
- Checking the existence of a file with DB connection settings
- Extracting DB connection settings and trying to connect to the database
- Retrieving a list of databases from a MySQL server and checking if the needed database exists
- Trying to create a directory for dump files
- Dumping the database in a specified location
Then goes things related to verifying that dump is not corrupted:
- Create a temporary database for testing purposes
- Load dump into test database
- Compare tables from source database and test database (this is a single step to ensure database integrity after restoring from backup)
- Delete temporary database
#!/usr/bin/env bash
# -*- encoding: utf-8 -*-
echo "[$(date -I)] dumping db"
PROJECT_ROOT=<project_root>
DOT_ENV=$PROJECT_ROOT/<project_name>/.env
#: Ensure existance of file with db connection settings
if [ -f $DOT_ENV ]; then
echo "[+] '.env' exists"
else
echo "[-] '.env' exists"
exit 1
fi
#: Extract database connection infromation from '.env' file
DB_NAME=$( awk -F '=' '/DB_NAME/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
DB_USER=$( awk -F '=' '/DB_USER/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
DB_HOST=$( awk -F '=' '/DB_HOST/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
DB_PASSWORD=$( awk -F '=' '/DB_PASSWORD/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
if $( mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"quit" 2>/dev/null ); then
echo "[+] db connection established with following credentials:
name: $DB_NAME
user: $DB_USER
pswd: $DB_PASSWORD
host: $DB_HOST"
else
echo "[-] db connection established. Verify credentials."
exit 1
fi
#: Ensure in DB existence
#: Make array from whitespace delimited string of databases
DATABASES_LIST=(
$( mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"SHOW DATABASES" 2>/dev/null )
)
#: Perform check
if [[ " ${DATABASES_LIST[@]} " =~ " ${DB_NAME} " ]] ; then
echo "[+] db '$DB_NAME' exists"
else
echo "[-] db '$DB_NAME' exists"
exit 1
fi
#: Create directory for dump files
DUMP_ROOT=<dump_root>
if $( mkdir -p $DUMP_ROOT 2>/dev/null ); then
echo "[+] '$DUMP_ROOT' exists"
else
echo "[-] '$DUMP_ROOT' exists"
exit 1
fi
#: Make database dump
DUMP_SQL_PATH=$DUMP_ROOT/dump-$(date -I).sql
dump_result=$(
mysqldump -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME > $DUMP_SQL_PATH \
2>/dev/null
)
if $dump_result; then
echo "[+] '$DUMP_SQL_PATH' created"
else
echo "[-] '$DUMP_SQL_PATH' created"
exit 1
fi
#: Create test database
DB_NAME_TEST="${DB_NAME}_dump_test"
create_test_db_result=$(
mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"CREATE DATABASE $DB_NAME_TEST" 2>/dev/null
)
if $create_test_db_result; then
echo "[+] db '$DB_NAME_TEST' created"
else
echo "[-] db '$DB_NAME_TEST' created"
exit 1
fi
#: Load dump to test database
dump_loading_result=$(
mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME_TEST < $DUMP_SQL_PATH \
2>/dev/null
)
if $dump_loading_result; then
echo "[+] '$DUMP_SQL_PATH' loaded to '$DB_NAME_TEST'"
else
echo "[-] '$DUMP_SQL_PATH' loaded to '$DB_NAME_TEST'"
exit 1
fi
#: Retrieve and compare tables list from both databases
#: Get an array from MySQL output
DB_TABLES_SOURCE=($(
mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME -e"SHOW TABLES;" \
2>/dev/null
))
#: Slice first element as it just hello string describing commmand
DB_TABLES_SOURCE=("${DB_TABLES_SOURCE[@]:1}")
#: Convert array back to string for string comparision
DB_TABLES_SOURCE="${DB_TABLES_SOURCE[@]}"
DB_TABLES_TEST=($(
mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME_TEST -e"SHOW TABLES;" \
2>/dev/null
))
DB_TABLES_TEST=("${DB_TABLES_TEST[@]:1}")
DB_TABLES_TEST="${DB_TABLES_TEST[@]}"
if [ "$DB_TABLES_SOURCE" == "$DB_TABLES_TEST" ]; then
echo "[+] dump file verified"
else
echo "[-] dump file corrupted"
fi
#: Drop test database
drop_test_db_result=$(
mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"DROP DATABASE $DB_NAME_TEST;" 2>/dev/null
)
if $drop_test_db_result; then
echo "[+] db '$DB_NAME_TEST' deleted"
else
echo "[-] db '$DB_NAME_TEST' deleted"
exit 1
fi
First of all, it is bash scripting and I'm not very familiar with it, so I'll be glad to see thoughts like anything has done badly or code style.
If you don't like the syntax highlighting, you can check out the gist version (same as here but probably easier to read).
1 Answer 1
I admit I don't understand the need to stand up the dump and compare it with the production database, especially if you are not going to retain the dumped database as a hot standby or similar. The mysqldump
tools give you significant flexibility in how you can log/handle errors that may come up during the dump process, as well as various performance, consistency, and table selection options.
What value are you REALLY gaining from just matching a list of tables? For a large database you could be consuming a lot of resources and time to go through this step. If that is your case, maybe you perform this test at some sampling interval as a sanity check.
If you find that your daily backup is THAT critical to your system recovery plans, then I would suggest your overall architecture might need revisiting. Why not have a read replica of master you could dump every X minutes/hours (depending on dump time) without impacting performance on production (and also giving you point-in-time recovery capabilities). Or, better yet, why not begin thinking of cloud-based solutions like Amazon RDS that can basically abstract away all of this logic and effort from you, allowing you to use single command to take database snapshot, restore from snapshot, etc.
The technology in this space has evolved pretty far past daily database dumps for use cases where you need high availability and/or more robust recovery capabilities.
That being said, I think you have put together a very well thought-out script. It is simple, well-documented, and clear in presentation. I think this could be useful for you for basic dump operation. I would just encourage to think about whether some of commentary above is more applicable to your use case.
Some more specific thoughts:
- Should you be using
1
as your exit status for every failure scenario? Does whatever is calling this script have the need for more granular exit status information? - I really like that you are pulling credentials from environment outside this script, though I wonder about fragility of the mechanism (searching through file). Should the functionality be encapsulated and exposed elsewhere in system (like in actual ENV variables, perhaps)? Ideally a DB dump script should not have to care about how to go out and read the settings information it needs from some application configuration file.
- Don't echo/log out your credentials!
- Why connect to database at all before doing dump? This is really an unnecessary step. If you try to execute the dump and, for example, the database specified doesn't exist, mysqldump will fail. You should handle mysqldump exit status (and possibly log detailed errors message which will be output to stderr) if taking this approach. The same could actually be applicable to the write path as well, where mysqldump will fail if path is not writable or file already exists (depending on configuration).