PostgreSQL Backup script (python)

Perhaps the following will be of use to others. In a nutshell, it’s a python script which backs up a provided list of PostgreSQL databases. I’ve written it for Windows, but it should work on Linux too (just change the paths in the BACKUP_DIR and dumper variables. No doubt it could be changed to query PostgreSQL for a list of databases, and dump these individually (like the MySQL python dumping script I wrote some time ago), but for now… let’s just stick with something simple.

#!python

from time import gmtime, strftime
import subprocess
import os
import glob
import time

# change these as appropriate for your platform/environment :
USER = "postgres"
PASS = "postgres"
HOST = "localhost"

BACKUP_DIR = "e:\\postgresql_backups\\"
dumper = """ "c:\\program files\\postgresql\\8.1\\bin\\pg_dump" -U %s -Z 9 -f %s -F c %s  """                   

def log(string):
    print time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime()) + ": " + str(string)

# Change the value in brackets to keep more/fewer files. time.time() returns seconds since 1970...
# currently set to 2 days ago from when this script starts to run.

x_days_ago = time.time() - ( 60 * 60 * 24 * 2 )

os.putenv('PGPASSWORD', PASS)

database_list = subprocess.Popen('echo "select datname from pg_database" | psql -t -U %s -h %s template1' % (USER,HOST) , shell=True, stdout=subprocess.PIPE).stdout.readlines()

# Delete old backup files first.
for database_name in database_list :
    database_name = database_name.strip()
    if database_name == '':
        continue

    glob_list = glob.glob(BACKUP_DIR + database_name + '*' + '.pgdump')
    for file in glob_list:
        file_info = os.stat(file)
        if file_info.st_ctime < x_days_ago:
            log("Unlink: %s" % file)
            os.unlink(file)
        else:
            log("Keeping : %s" % file)

log("Backup files older than %s deleted." % time.strftime('%c', time.gmtime(x_days_ago)))

# Now perform the backup.
for database_name in database_list :
    log("dump started for %s" % database_name)
    thetime = str(strftime("%Y-%m-%d-%H-%M")) 
    file_name = database_name + '_' + thetime + ".sql.pgdump"
    #Run the pg_dump command to the right directory
    command = dumper % (USER,  BACKUP_DIR + file_name, database_name)
    log(command)
    subprocess.call(command,shell = True)
    log("%s dump finished" % database_name)

log("Backup job complete.")

That’s all folks.

Automated snapshot backup of an Amazon EBS volume

I found the following Python script online, but it didn’t really work too well :

http://aws-musings.com/manage-ebs-snapshots-with-a-python-script/

EBS – Elastic Block Storage …

I had to easy_install boto, to get it to work.

I’m not sure the Debian python-boto package in Lenny is up to date.

Anyway, $server now has :

from boto.ec2.connection import EC2Connection
from boto.ec2.regioninfo import RegionInfo

from datetime import datetime
import sys

# Substitute your access key and secret key here
aws_access_key = 'MY_AWS_ACCESS_KEY'
aws_secret_key = 'MY_AWS_SECRET_KEY'
# Change to your region/endpoint...
region = RegionInfo(endpoint='eu-west-1.ec2.amazonaws.com', name='eu-west-1')

if len(sys.argv) < 3:
    print "Usage: python manage_snapshots.py volume_id number_of_snapshots_to_keep description"     
    print "volume id and number of snapshots to keep are required. description is optional"
    sys.exit(1) 

vol_id = sys.argv[1] 
keep = int(sys.argv[2]) 
conn = EC2Connection(aws_access_key, aws_secret_key, region=region) 
volumes = conn.get_all_volumes([vol_id]) 
print "%s" % repr(volumes) 
volume = volumes[0] 
description = 'Created by manage_snapshots.py at ' + datetime.today().isoformat(' ') 
if len(sys.argv) > 3:
    description = sys.argv[3]

if volume.create_snapshot(description):
    print 'Snapshot created with description: ' + description

snapshots = volume.snapshots()
snapshot = snapshots[0]

def date_compare(snap1, snap2):
    if snap1.start_time < snap2.start_time:
        return -1
    elif snap1.start_time == snap2.start_time:
        return 0
    return 1

snapshots.sort(date_compare)
delta = len(snapshots) - keep
for i in range(delta):
    print 'Deleting snapshot ' + snapshots[i].description
    snapshots[i].delete()

And then plonk something like the following in /etc/cron.daily/backup_ebs :

for volume in vol-xxxx vol-yyyyy vol-zzzz
do
	/path/to/above/python/script.py $volume 7 "Backup of $volume on $(date +%F-%H:%m)"
done

Which keeps 7 backups for each volume with a time/date stamp in each description.

Logging … and how not to do it.

One thing that really annoys me is when I come to look at the log file and I see something like :

blah blah did blah blah
blah foo blah random comment
fish blah some data
which spans many lines or does it?

This is bad, as I’ve got absolutely no idea where the messages are from (so have to grep around a code base), and I’ve no idea WHEN they were made. At best I can look at timestamps on this file and figure out a timeframe (assuming logrotate is in use so there is a definite (must be after X timestamp)).

What’s far better from a maintenance point of view :

2010/07/29 09:33 filewhatever.py:355 blah blah blah did blah blah

2010/07/29 09:34 filewhatever.py:355 blah blah blah did blah blah

2010/07/29 09:35 filewhatever.py:355 data received from x is {{{hello world…. }}}

Changes are :

  1. Date and time stamps (in python: datetime.datetime.now())
  2. Recording where the message came from (see the ‘inspect’ python module – inspect.stack()[1][1] for calling file, and inspect.stack()[1][2] for the line number, or debug_backtrace() in PHP)
  3. Wrapping any interesting output (e.g. from a remote service) in obvious delimiters (e.g. {{{ and }}} )  – without e.g. timestamps or some other common line prefix, I’ve no way of knowing what’s from where, especially if the output spreads over many lines.

Other good ideas :

  1. Different severities of log message (classic: debug, info, error type annotation with appropriate filtering).
  2. Make sure logrotate is in use, or a simple shell script via cron, to stop the log file growing too large and causing problems.
  3. Stop writing your own logging mechanisms and use ones provided by the system (e.g. Python has a logger built in which does all of the above and more)

EOR – EndOfRant

wapiti – web application vulnerability scanner (super quick review/intro)

Today, I finally looked at Wapiti, which is a web application vulnerability scanner. It operates on a black box basis (i.e. it doesn’t see the underlying PHP/ASP/Java source code), and effectively tries to ‘break’ any forms on a page.

In order to get it to do anything useful, you’ll probably need to provide it with a cookie file to use. Unfortunately, I couldn’t originally get the provided ‘getcookie.py’ file to work, as the application in question just posted the login form details to ” (i.e. <form action=” method=’post’>)…. after a bit of hacking I fixed this, but it took some time.

Installation is relatively easy – download the .zip file, extract it and change directory into it (e.g. cd wapiti-2.0.X)

Anyway, given we have “webapp” installed at http://orange/webapp, and we wish to test it, we might do something like the following :

  1. cd src/net
  2. python getcookie.py ~/cookie.txt http://orange/webapp/login.php
  3. Enter username/password etc as required to complete the login form
  4. Script exists, check the contents of ~/cookie.txt – it will look something like :

#LWP-Cookies-2.0
Set-Cookie3: PHPSESSID=3d20841af5de43c718732d80e5d78fe3; path=”/”; domain=”orange”; path_spec; expires=”2010-01-04 22:42:47Z”; version=0

Now we can use wapiti to test any urls ‘behind’ the login screen (as it were) :

wapiti http://orange/webapp/search.php –cookie ~/cookie.txt -v 2 -o ~/report -x http://orange/webapp/logout.php

(We need to exclude the logout page, else our session will get destroyed when wapiti spiders that page…)

Depending on how good the application is, you may see output like :

Found permament XSS in http://orange/webapp/search.php
attacked by http://orange/webapp/search.php?area=on&client_id=on&county=on with fields county=crzbl79tqr&status=x57cjl7m14&website=vk59qqbgmp&name=<script>alert(’11byq04xd1′)</script>&client_id=on&region=on

and similar for the other vulnerabilities.

If I point my web browser at file:///home/david/report I’ll see a nice HTML report listing the vulnerabilites and so on – similar to the below…

report output etc
report output etc

Wapiti appears to detect:

  • SQL Injection holes
  • Cross Site Scripting (XSS) holes
  • File inclusion (local/remote)
  • Command execution vulnerabilities
  • and others

I’m a bit annoyed I’ve only found this tool now – but also glad I’ve finally found it. I’ve been looking for something that can pick up XSS holes for ages (SQL Injection stuff I could already test using SQLMap, and ensuring I only ever used prepared statements).

Update (July 2011) – cookie file format has changed to xml –

<?xml version="1.0" encoding="UTF-8"?>
<cookies>
  <domain name="uk">
    <domain name="co">
      <domain name="palepurple">
        <domain name="david">
            <cookie name="PHPSESSID" path="/" value="vmabdv5giph334aq33vb0add67" version="0"/>
            <cookie name="globdisc" path="/" value="yes" version="0"/>
          </domain>
      </domain>
    </domain>
  </domain>
</cookies>