Backup MySQL database for Django projects

Question 1

I have a usual Django application with a single MySQL database. The purpose of this shell script is to backup a MySQL database every midnight by crontab and check that everything goes right.

The server has mysql and mysqldump executables. The connection to the database is stored with the help of django-dotenv, which is a config file with the structure:

<key>='<value>'

The script has three template variables:

<project_root> - this is the root directory of the Django project
<project_name> - this is the path from the root directory to the .env file or project name (they match for the Django project structure)
<dump_root> - path to store dumps

Note: Not all of them end with a slash.

For now, I'm doing these things:

Checking the existence of a file with DB connection settings
Extracting DB connection settings and trying to connect to the database
Retrieving a list of databases from a MySQL server and checking if the needed database exists
Trying to create a directory for dump files
Dumping the database in a specified location

Then goes things related to verifying that dump is not corrupted:

Create a temporary database for testing purposes
Load dump into test database
Compare tables from source database and test database (this is a single step to ensure database integrity after restoring from backup)
Delete temporary database

#!/usr/bin/env bash
# -*- encoding: utf-8 -*-
echo "[$(date -I)] dumping db"
PROJECT_ROOT=<project_root>
DOT_ENV=$PROJECT_ROOT/<project_name>/.env
#: Ensure existance of file with db connection settings
if [ -f $DOT_ENV ]; then
 echo "[+] '.env' exists"
else
 echo "[-] '.env' exists"
 exit 1
fi
#: Extract database connection infromation from '.env' file
DB_NAME=$( awk -F '=' '/DB_NAME/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" ) 
DB_USER=$( awk -F '=' '/DB_USER/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
DB_HOST=$( awk -F '=' '/DB_HOST/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
DB_PASSWORD=$( awk -F '=' '/DB_PASSWORD/ {print 2ドル}' $DOT_ENV | sed -e "s/'//g" )
if $( mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"quit" 2>/dev/null ); then
 echo "[+] db connection established with following credentials:
 name: $DB_NAME 
 user: $DB_USER 
 pswd: $DB_PASSWORD 
 host: $DB_HOST"
else
 echo "[-] db connection established. Verify credentials."
 exit 1
fi
#: Ensure in DB existence
#: Make array from whitespace delimited string of databases
DATABASES_LIST=(
 $( mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"SHOW DATABASES" 2>/dev/null )
)
#: Perform check
if [[ " ${DATABASES_LIST[@]} " =~ " ${DB_NAME} " ]] ; then
 echo "[+] db '$DB_NAME' exists"
else
 echo "[-] db '$DB_NAME' exists"
 exit 1
fi
#: Create directory for dump files
DUMP_ROOT=<dump_root>
if $( mkdir -p $DUMP_ROOT 2>/dev/null ); then 
 echo "[+] '$DUMP_ROOT' exists"
else
 echo "[-] '$DUMP_ROOT' exists"
 exit 1
fi
#: Make database dump
DUMP_SQL_PATH=$DUMP_ROOT/dump-$(date -I).sql 
dump_result=$( 
 mysqldump -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME > $DUMP_SQL_PATH \
 2>/dev/null 
)
if $dump_result; then
 echo "[+] '$DUMP_SQL_PATH' created"
else
 echo "[-] '$DUMP_SQL_PATH' created"
 exit 1
fi
#: Create test database
DB_NAME_TEST="${DB_NAME}_dump_test"
create_test_db_result=$(
 mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"CREATE DATABASE $DB_NAME_TEST" 2>/dev/null 
)
if $create_test_db_result; then
 echo "[+] db '$DB_NAME_TEST' created"
else
 echo "[-] db '$DB_NAME_TEST' created"
 exit 1
fi
#: Load dump to test database
dump_loading_result=$( 
 mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME_TEST < $DUMP_SQL_PATH \
 2>/dev/null 
)
if $dump_loading_result; then
 echo "[+] '$DUMP_SQL_PATH' loaded to '$DB_NAME_TEST'"
else
 echo "[-] '$DUMP_SQL_PATH' loaded to '$DB_NAME_TEST'"
 exit 1
fi
#: Retrieve and compare tables list from both databases
#: Get an array from MySQL output
DB_TABLES_SOURCE=($( 
 mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME -e"SHOW TABLES;" \
 2>/dev/null
))
#: Slice first element as it just hello string describing commmand
DB_TABLES_SOURCE=("${DB_TABLES_SOURCE[@]:1}")
#: Convert array back to string for string comparision
DB_TABLES_SOURCE="${DB_TABLES_SOURCE[@]}"
DB_TABLES_TEST=($( 
 mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST $DB_NAME_TEST -e"SHOW TABLES;" \
 2>/dev/null
))
DB_TABLES_TEST=("${DB_TABLES_TEST[@]:1}")
DB_TABLES_TEST="${DB_TABLES_TEST[@]}"
if [ "$DB_TABLES_SOURCE" == "$DB_TABLES_TEST" ]; then
 echo "[+] dump file verified"
else
 echo "[-] dump file corrupted"
fi
#: Drop test database
drop_test_db_result=$(
 mysql -u $DB_USER -p$DB_PASSWORD -h $DB_HOST -e"DROP DATABASE $DB_NAME_TEST;" 2>/dev/null 
)
if $drop_test_db_result; then
 echo "[+] db '$DB_NAME_TEST' deleted"
else
 echo "[-] db '$DB_NAME_TEST' deleted"
 exit 1
fi

First of all, it is bash scripting and I'm not very familiar with it, so I'll be glad to see thoughts like anything has done badly or code style.

If you don't like the syntax highlighting, you can check out the gist version (same as here but probably easier to read).

Question 2

I admit I don't understand the need to stand up the dump and compare it with the production database, especially if you are not going to retain the dumped database as a hot standby or similar. The mysqldump tools give you significant flexibility in how you can log/handle errors that may come up during the dump process, as well as various performance, consistency, and table selection options.

What value are you REALLY gaining from just matching a list of tables? For a large database you could be consuming a lot of resources and time to go through this step. If that is your case, maybe you perform this test at some sampling interval as a sanity check.

If you find that your daily backup is THAT critical to your system recovery plans, then I would suggest your overall architecture might need revisiting. Why not have a read replica of master you could dump every X minutes/hours (depending on dump time) without impacting performance on production (and also giving you point-in-time recovery capabilities). Or, better yet, why not begin thinking of cloud-based solutions like Amazon RDS that can basically abstract away all of this logic and effort from you, allowing you to use single command to take database snapshot, restore from snapshot, etc.

The technology in this space has evolved pretty far past daily database dumps for use cases where you need high availability and/or more robust recovery capabilities.

That being said, I think you have put together a very well thought-out script. It is simple, well-documented, and clear in presentation. I think this could be useful for you for basic dump operation. I would just encourage to think about whether some of commentary above is more applicable to your use case.

Some more specific thoughts:

Should you be using 1 as your exit status for every failure scenario? Does whatever is calling this script have the need for more granular exit status information?
I really like that you are pulling credentials from environment outside this script, though I wonder about fragility of the mechanism (searching through file). Should the functionality be encapsulated and exposed elsewhere in system (like in actual ENV variables, perhaps)? Ideally a DB dump script should not have to care about how to go out and read the settings information it needs from some application configuration file.
Don't echo/log out your credentials!
Why connect to database at all before doing dump? This is really an unnecessary step. If you try to execute the dump and, for example, the database specified doesn't exist, mysqldump will fail. You should handle mysqldump exit status (and possibly log detailed errors message which will be output to stderr) if taking this approach. The same could actually be applicable to the write path as well, where mysqldump will fail if path is not writable or file already exists (depending on configuration).

Mike Brant Mike BrantMike Brant 9,85814 silver badges24 bronze badges · Accepted Answer · 2016-10-05 19:52:20Z

I admit I don't understand the need to stand up the dump and compare it with the production database, especially if you are not going to retain the dumped database as a hot standby or similar. The mysqldump tools give you significant flexibility in how you can log/handle errors that may come up during the dump process, as well as various performance, consistency, and table selection options.

What value are you REALLY gaining from just matching a list of tables? For a large database you could be consuming a lot of resources and time to go through this step. If that is your case, maybe you perform this test at some sampling interval as a sanity check.

If you find that your daily backup is THAT critical to your system recovery plans, then I would suggest your overall architecture might need revisiting. Why not have a read replica of master you could dump every X minutes/hours (depending on dump time) without impacting performance on production (and also giving you point-in-time recovery capabilities). Or, better yet, why not begin thinking of cloud-based solutions like Amazon RDS that can basically abstract away all of this logic and effort from you, allowing you to use single command to take database snapshot, restore from snapshot, etc.

The technology in this space has evolved pretty far past daily database dumps for use cases where you need high availability and/or more robust recovery capabilities.

That being said, I think you have put together a very well thought-out script. It is simple, well-documented, and clear in presentation. I think this could be useful for you for basic dump operation. I would just encourage to think about whether some of commentary above is more applicable to your use case.

Some more specific thoughts:

Should you be using 1 as your exit status for every failure scenario? Does whatever is calling this script have the need for more granular exit status information?
I really like that you are pulling credentials from environment outside this script, though I wonder about fragility of the mechanism (searching through file). Should the functionality be encapsulated and exposed elsewhere in system (like in actual ENV variables, perhaps)? Ideally a DB dump script should not have to care about how to go out and read the settings information it needs from some application configuration file.
Don't echo/log out your credentials!
Why connect to database at all before doing dump? This is really an unnecessary step. If you try to execute the dump and, for example, the database specified doesn't exist, mysqldump will fail. You should handle mysqldump exit status (and possibly log detailed errors message which will be output to stderr) if taking this approach. The same could actually be applicable to the write path as well, where mysqldump will fail if path is not writable or file already exists (depending on configuration).

Stack Exchange Network

Backup MySQL database for Django projects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Backup MySQL database for Django projects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions