PostgreSQL Backup script (python)

Perhaps the following will be of use to others. In a nutshell, it’s a python script which backs up a provided list of PostgreSQL databases. I’ve written it for Windows, but it should work on Linux too (just change the paths in the BACKUP_DIR and dumper variables. No doubt it could be changed to query PostgreSQL for a list of databases, and dump these individually (like the MySQL python dumping script I wrote some time ago), but for now… let’s just stick with something simple.

#!python

from time import gmtime, strftime
import subprocess
import os
import glob
import time

# change these as appropriate for your platform/environment :
USER = "postgres"
PASS = "postgres"
HOST = "localhost"

BACKUP_DIR = "e:\\postgresql_backups\\"
dumper = """ "c:\\program files\\postgresql\\8.1\\bin\\pg_dump" -U %s -Z 9 -f %s -F c %s  """                   

def log(string):
    print time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime()) + ": " + str(string)

# Change the value in brackets to keep more/fewer files. time.time() returns seconds since 1970...
# currently set to 2 days ago from when this script starts to run.

x_days_ago = time.time() - ( 60 * 60 * 24 * 2 )

os.putenv('PGPASSWORD', PASS)

database_list = subprocess.Popen('echo "select datname from pg_database" | psql -t -U %s -h %s template1' % (USER,HOST) , shell=True, stdout=subprocess.PIPE).stdout.readlines()

# Delete old backup files first.
for database_name in database_list :
    database_name = database_name.strip()
    if database_name == '':
        continue

    glob_list = glob.glob(BACKUP_DIR + database_name + '*' + '.pgdump')
    for file in glob_list:
        file_info = os.stat(file)
        if file_info.st_ctime < x_days_ago:
            log("Unlink: %s" % file)
            os.unlink(file)
        else:
            log("Keeping : %s" % file)

log("Backup files older than %s deleted." % time.strftime('%c', time.gmtime(x_days_ago)))

# Now perform the backup.
for database_name in database_list :
    log("dump started for %s" % database_name)
    thetime = str(strftime("%Y-%m-%d-%H-%M")) 
    file_name = database_name + '_' + thetime + ".sql.pgdump"
    #Run the pg_dump command to the right directory
    command = dumper % (USER,  BACKUP_DIR + file_name, database_name)
    log(command)
    subprocess.call(command,shell = True)
    log("%s dump finished" % database_name)

log("Backup job complete.")

That’s all folks.

Automated snapshot backup of an Amazon EBS volume

I found the following Python script online, but it didn’t really work too well :

http://aws-musings.com/manage-ebs-snapshots-with-a-python-script/

EBS – Elastic Block Storage …

I had to easy_install boto, to get it to work.

I’m not sure the Debian python-boto package in Lenny is up to date.

Anyway, $server now has :

from boto.ec2.connection import EC2Connection
from boto.ec2.regioninfo import RegionInfo

from datetime import datetime
import sys

# Substitute your access key and secret key here
aws_access_key = 'MY_AWS_ACCESS_KEY'
aws_secret_key = 'MY_AWS_SECRET_KEY'
# Change to your region/endpoint...
region = RegionInfo(endpoint='eu-west-1.ec2.amazonaws.com', name='eu-west-1')

if len(sys.argv) < 3:
    print "Usage: python manage_snapshots.py volume_id number_of_snapshots_to_keep description"     
    print "volume id and number of snapshots to keep are required. description is optional"
    sys.exit(1) 

vol_id = sys.argv[1] 
keep = int(sys.argv[2]) 
conn = EC2Connection(aws_access_key, aws_secret_key, region=region) 
volumes = conn.get_all_volumes([vol_id]) 
print "%s" % repr(volumes) 
volume = volumes[0] 
description = 'Created by manage_snapshots.py at ' + datetime.today().isoformat(' ') 
if len(sys.argv) > 3:
    description = sys.argv[3]

if volume.create_snapshot(description):
    print 'Snapshot created with description: ' + description

snapshots = volume.snapshots()
snapshot = snapshots[0]

def date_compare(snap1, snap2):
    if snap1.start_time < snap2.start_time:
        return -1
    elif snap1.start_time == snap2.start_time:
        return 0
    return 1

snapshots.sort(date_compare)
delta = len(snapshots) - keep
for i in range(delta):
    print 'Deleting snapshot ' + snapshots[i].description
    snapshots[i].delete()

And then plonk something like the following in /etc/cron.daily/backup_ebs :

for volume in vol-xxxx vol-yyyyy vol-zzzz
do
	/path/to/above/python/script.py $volume 7 "Backup of $volume on $(date +%F-%H:%m)"
done

Which keeps 7 backups for each volume with a time/date stamp in each description.