Scott Watermasysk
SQLite BackUp to S3
I recently moved HowIVSCode to HatchBox. As part of their setup, they provide a shared folder for each application persisted across deployments.
However, at this time, there is no option to back up that data.
Side Note: Digital Ocean provides server backups, which would likely work, but I would rather my backups exist outside the network managing my servers.
What I ended up doing was writing a script that does the following:
- Loops through all the SQLite files in a given directory
- Uses the SQLite
.backup
command to perform a backup safely - Gzip the file
- Uses GPG to encrypt the backup
- Send the backup to a locked down bucket on S3 via
curl
(so no aws cli dependency) - Cleans up when done
On S3, I have the bucket configured to delete any files older than 31 days. This should keep costs in check, and you should configure this to your needs.
Before the script, I want to give a big shout-out to Paweł Urbanek and his guide for doing this with PostgreSQL + Heroku. I have been running a similar setup for a couple of years now, and knowing my data is safe outside of Heroku is excellent. I also want to shout out this Chris Parson's gist, which paved the way for sending the data to S3 without needing to install the ASW CLI.
The script uses five ENV variables (although you can hard code your values at the top)
The one BACKUP_S3_DB_PASSPHRASE
must be saved somewhere you will remember. This is the passphrase used by GPG. The only thing worse than losing your database is having a backup you cannot decrypt. 😁
Here is a gist of the script.
#!/bin/bash
set -e
s3_key=$BACKUP_S3_KEY
s3_secret=$BACKUP_S3_SECRET
bucket=$BACKUP_S3_BUCKET
backup_db_passphrase=$BACKUP_S3_DB_PASSPHRASE
data_directory=$SQLITE_DATABASE_DIRECTORY
# ensure each backup has the same date key
date_key=$(date '+%Y-%m-%d-%H-%M-%S')
function backupToS3()
{
database=$1
database_file_name=$(basename -- "$database")
database_name="${database_file_name%.*}"
backup_file_name="/tmp/$database_name-backup-$date_key.sqlite3"
gpg_backup_file_name="$database_name-$date_key.gpg"
sqlite3 "$database" ".backup $backup_file_name"
gzip "$backup_file_name"
gpg --yes --batch --passphrase="$backup_db_passphrase" --output "/tmp/$gpg_backup_file_name" -c "$backup_file_name.gz"
date=$(date +"%a, %d %b %Y %T %z")
content_type='application/tar+gzip'
string="PUT\n\n$content_type\n$date\n/$bucket/$gpg_backup_file_name"
signature=$(echo -en "${string}" | openssl sha1 -hmac "${s3_secret}" -binary | base64)
curl -X PUT -T "/tmp/$gpg_backup_file_name" \
-H "Host: $bucket.s3.amazonaws.com" \
-H "Date: $date" \
-H "Content-Type: $content_type" \
-H "Authorization: AWS ${s3_key}:$signature" \
"https://$bucket.s3.amazonaws.com/$gpg_backup_file_name"
rm "$backup_file_name.gz"
rm "/tmp/$gpg_backup_file_name"
}
for file in "$data_directory"/*.sqlite3; do
backupToS3 "$file"
done
Quick Summary of the script
- Lines 4-8 - grab the ENV variables
- Line 10 - grab a date we can use to append to the file name and avoid collisions
- Line 12, declare a function
backuToS3
we will use at the end to iterate over each database in the directory - Lines 14-17 - extract the database file name. A significant benefit to SQLite is there is no harm in having many databases for individual tasks. For HowIVSCode, I use LiteStack, which creates separate databases for Data, Cache, and Queue.
- Lines 22-24 - backup, zip, and encrypt
- Lines 26-35 - send the file to AW3. If you have the AWS CLI installed, you could probably replace that with
aws s3 cp "/tmp/${gpg_backup_file_name}" "s3://$bucket/$gpg_backup_file_name"
- Lines 37-38 - clean up the tmp files
- Lines 41-43 - loop through any .sqlite3 files in the directory.