Rate limiting http traffic (mod_evasive and iptables)

A customer has a relatively busy web site, which contains lots of juicy information (business names, addresses, email address, phone numbers etc etc). Currently there is nothing in place to stop people spidering it – unless someone explicitly looks at the log files and does something.

Blocking annoying people who spider the site is easy enough –

iptables -I INPUT -s 80.x.x.x -j REJECT

However, I’d obviously rather automate this if possible – and ideally without having to change the PHP code (as each request would need perform some sort of DB lookup it’s part of a spidering attempt)

So, my first idea was to manipulate an existing rule I have to limit SSH connection attempts, giving something like :

iptables -I INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --set
iptables -I INPUT -p tcp --dport 80 -i eth0 -m state --state NEW -m recent --update --seconds 60 --hitcount 40 -j LOG --log-prefix "http spidering?" --log-ip-options --log-tcp-options --log-tcp-sequence --log-level 4

Annoyingly however, even though these are the first rules in the iptables output – and they should therefore work, they don’t – i.e. I’m not seeing anything being logged, when doing e.g. the following on a remote server :

while [ true ] ; do
wget -q -O - http://server.xyz/index.php
done

So, I’m still trying to avoid making changes to the code base – although doing so would produce the best user experience (namely we could display a captcha or something and if someone really can browse that quickly they’d not encounter any problems).

And as I’ve just found mod_evasive which claims to provide DoS and DDoS protection. Thankfully Jason Litka has packaged it – so I have no problems from an installation point of view 🙂 (yum install mod_evasive)

Installation on Debian doesn’t result in a config file – but it’s not difficult to create (see /usr/share/doc/mod_evasive). However, it’s not a shiney, sunny ending – mod_evasive appears to be “tripped” by people requesting images – and in my case the client has about 10-20 images per page; so it’s difficult to differentiate between a normal user loading a page or someone running httrack on the website and only requesting the “php page”. If only mod_evasive took a regexp to ignore/match… and I can’t seem to find anyway of fixing this.

So application logic it is :-/ Perhaps caching in APC may be the way forward ….

Twitter Weekly Updates for 2010-01-31

  • RT @glynmoody Facebook rewrites PHP runtime – http://bit.ly/ahwWiq to be released as open source #facebook #php #
  • Poop. Spoke too soon. Snow tap being turned off. F'ing weather god. Curse you. #
  • Decent snow. 4/10 perhaps. B61 #uksnow … Just keep up for an hour or two and perhaps i can sledge/snow fight. #
  • Looks like we had a token amount of snow last night. Looks cold too :-/ 14 miles here I run. Still, cybrosis ppdcast episode to listen to:) #
  • Our car has depreciated £1000 for every year (7) we've owned it. Perhaps it'll soon put on value when as it turns into a mobile pool? #
  • Today I did a total of 105 pushups thanks to the Hundred Pushups iPhone app. (Week 3, Day 3, Level 3) #100Pushups #
  • RT @stuherbert PHP 5.3 adoption: some numbers and talking points http://bit.ly/djJRos (please RT) #
  • Dogs have a very inefficient protocol for communication. Guessing lots of packet loss as they've been retransmitting for ages now. Woof woof #
  • The last apple in the shop should be avoided; keys are always in the last pocket you check. #lessonoftheday #
  • Today I did a total of 97 situps thanks to the 200 Situps iPhone app. (Week 2, Day 1, Level 3) #200Situps #
  • Twitterific appears to have won. Goodbye tweetdeck. #
  • Nice Run – roads (a38 etc) were almost empty, shame I'd have to get up at 5am to experience it more often :-/ #
  • 100 pushup thing is now hard; couldn't do last rep without two stops :-/ #100pushups weak puny arms get bigger! #
  • Today I did a total of 100 pushups thanks to the Hundred Pushups iPhone app. (Week 3, Day 2, Level 3) #100Pushups #
  • Today I did a total of 92 situps thanks to the 200 Situps iPhone app. (Week 1, Day 3, Level 3) #200Situps #
  • is giving tweetdeck a whirl… as a change from twitterific #
  • Shocked to receive apple MacBook he ordered online yesterday afternoon this morning. Win! #
  • Today I did a total of 80 pushups thanks to the Hundred Pushups iPhone app. (Week 3, Day 1, Level 3) #100Pushups #
  • Worringly I seem to like coffee (with chocolate biscuits) I wish there was no junk food in this house. I'd best help 'dispose' of it….. #
  • Today I did a total of 88 situps thanks to the 200 Situps iPhone app. (Week 1, Day 2, Level 3) #200Situps #

Verified by Visa …. what rubbish

On Wednesday I was trying to buy train tickets for an upcoming trip to London.

So, I book the tickets, and get to point of being asked for my card details … tap tap tap … kapow … Up comes the Verified by Visa payment screen (in a stupid iframe [how do I know this isn’t a phishing site?]). Well, it displays my ‘username’ correctly – a terrificly hard to guess one of MRDAVIDGOODWIN… I enter my details and it keeps decling them. Hmm.. Fine… perhaps I’ve incorrectly stored the password – “oooh look – reset password…” *click* – “You want me to enter my date of birth… is that the ONLY security check you’re going to do? WTF??? ”

Grr.. Why do they bother….

See also http://www.lightbluetouchpaper.org/2010/01/26/how-online-card-security-fails/

Random PHP project progress

Random php development musing

Initially when we founded Pale Purple all our new PHP development used a combination of Propel, Smarty and some inhouse glue. Over time we seem to have drifted towards the Zend Framework, but I’ve never been particularly happy with Zend_Db or Zend_View. Why the Zend Framework? Well, it has loads of useful components (Cache, Form, Routing, Mail etc) and it’s near enough an industry standard from what we see – and obviously I’d rather build on the shoulders of others than spend time developing an in-house framework no one else will ever use.

For one customer, we’re currently working on the next iteration of their code base – which incorporates a number of significant changes. When we inherited the code base from the previous developers we spent a long time patching various SQL Injection holes (casting to ints), moving over to use PDO’s prepared statements and trying to keep on top of the customer’s new functionality requests and support issues. There’s still a lot of horrible code to refactor, plenty of security holes (although none public facing) and we know we’re moving in the right direction – hopefully patching and duct tape will soon be a thing of the past as it will develop some form of architecture and look like someone has thought about design and long term maintenance.

I’ve started to properly do Test First Development – at least from a support perspective – as too often we’d find we would patch a bug, only for it to reappear again in a few weeks/months time. This has been especially useful with the SOAP interface the application exposes. The tests run every 5 minutes, and we all get emailed when stuff breaks – it took all of 30 minutes to setup and put in place – then it was just a case of actually writing the unit tests themselves (the tests take minutes to write; finding/fixing any bugs they pin point takes somewhat longer :-/ ). I’ve also abused Simpletest’s web testing ‘stuff’ to also act as an availability checker of the live site (i.e. hit a few remote URLs, and check that we don’t get error messages back and do see expected strings).

The original code base had no ‘model’ like layer (or MVC ‘compliance’) – files containing HTML, CSS, SQL, Javascript and PHP were the norm – we’ve added Propel to the project as the ‘model’ layer – which took a few hours; and then when reverse engineering the database we found a few oddities (tables without primary keys and so on) – anyway, moving the functionality from a handful of legacy objects across into the Propel ones seems to be well underway, and I for one will be glad to see the end of :

$x = new Foo(5);

Accompanied with code that does the equivalent of :

class Foo {
    public function __construct($id = false) {
        if($id != false) {
            // select * from foo where id = 5
            // populate $this; don't bother checking for the edge case where $id isn't valid
       }
       else {
           // insert into foo values ('');
          // populate $this->id; leaving all other fields as empty strings...
     }
     public function setBaz($nv) { // repeat for all table fields
         $this->baz = $nv;
         global $db;
         $db->query('update foo set baz = "' . $nv . '" where id = ' . $this->id);
     }
}

Finally, we have a meaningful directory structure – where some things aren’t exposed in the document root. Hopefully soon a front controller and some decent routing. At the moment a huge amount of code is just sat in the ‘public’ directory due to it’s nature. We hope to fix this in time, and move to using Zend Controller – once we get Smarty integrated with it.

Propel has added some nice new features since we last used it (effectively v1.2); it was a toss up between it and Doctrine (as obviously the ZF is moving in that direction) – but we already had knowledge/experience with Propel and it seemed the easier option.

I’m hoping that with time we’ll be able to get up to at least 60% test coverage of the code base – at that point we should be able to refactor the code far easier and with less fear. At the moment I doubt the unit tests cover more than 5-10% – although maybe it’s time I pointed xdebug or whatever at it to generate some meaningful stats.

My final task is to get some decent performance measurements out of the code base – so we can track any performance regressions. I’m fairly confident that moving to Propel will result in an speedup as duplicate object hydrations will be eliminated thanks to it’s instance pool, however having hard figures and nice graphs to point at would be ideal. So far I’ve knocked up my own script around ‘ab’ which stores some figures to a .csv file and uses ezComponents to generate a graph file. This seems to be a crap solution, but I can’t think or find anything better. Any suggestions dear Internet? Perhaps I should integrate changeset/revision id’s in my benchmarking too. Suggestions here would be exceedingly appreciated.

There, I should have ticked all necessary boxes wrt development practices now. Now to work on finding a contract PHP developer….

The PHP Security Journey begins…

Here’s the slides from the PHPWM talk I gave last week PHPWM Presentation – The Security Journey Begins ; thanks to DeanC on #phpwm for reminding me to upload them 🙂

The presentation focusses on security issues in web applications – specifically, PHP – although obviously other web facing languages face the same problems. It’s a very condensed version of what I normally give as a two day PHP security training course – so there are bits missing, and many things aren’t explained fully… and obviously the demonstration after the slides is missing 🙂

(250kb, PDF file… I think)

Twitter Weekly Updates for 2010-01-24

  • Bubble blowing fail day. Do mixtures have a BBE date? Rowan seems happy with one bubble in 10 goes. #
  • 820 days uptime is sufficient; time for a long overdue reboot I think. #linux #
  • There's still snow outside tesco. Strange redditch. #
  • Today I did a total of 77 situps thanks to the 200 Situps iPhone app. (Week 1, Day 1, Level 3) #200Situps #
  • RT @Ade_B OMG I didnt realise they were making a new A Team Movie http://bit.ly/7iCLiL via @purityale w00t #
  • Wonder why everyone wishes they'd stayed in bed today?. Today was quite good for me…. #
  • Today I did a total of 81 pushups thanks to the Hundred Pushups iPhone app. (Week 2, Day 3, Level 3) #100Pushups #
  • Wake up little bunnies! #
  • RT @loudmouthman Well when you put it like that http://www.life-stylefitness.com/Exercise%20or%20Death.jpg #
  • This side of heaven is right next door to hell. #
  • Enjoying Thunderbird 3 – faster, better UI; 3.0.1 is now out – http://lwn.net/Articles/370465/ #email #floss #thunderbird #
  • Met office once again fail. There's no snow here. #uksnow b61 #
  • The @scottsigler iphone app looks cool (chainsaw and kitten juggling eh?). It's free, gives easy access to great audiobooks +more #podcast #
  • Today I did a total of 74 pushups thanks to the Hundred Pushups iPhone app. (Week 2, Day 2, Level 3) #100Pushups #
  • Yawn. #
  • RT @rowangoodwin This time 2 years ago I was preparing to make my grand entrance! #
  • Wish my iPhone had a fingerprint/ facial/retinal recognition, instead of asking me for a password all the time. It has a camera afterall. #
  • http://www.predictablyirrational.com/?p=704 – Google autocomplete rocks. See also http://autocompleteme.com #
  • trying to find a decent twitter username for $customer; it's like domain name squatting all over again. #
  • RT @evilneuro another reason not to use Internet Explorer, ever: http://bit.ly/6xbH5z – switch to chrome? #
  • iPhone voice recognition is getting worse. "phone Katherine Goodwin" != "phone kathryn reeve" *sigh* need aliasing or shortcuts #
  • Did aliens help plot the location of Woolworths? http://bengoldacre.posterous.com/did-aliens-play-a-role-in-woolworths #
  • Today I did a total of 63 pushups thanks to the Hundred Pushups iPhone app. (Week 2, Day 1, Level 3) #100Pushups #
  • Hmm. Heavy snow for weds; heavy rain for thurs. Fun times ahead. #

Twitter Weekly Updates for 2010-01-17

  • Fantastic Mr Fox looks pretty good; Rowan seems to approve too 🙂 #
  • 13.26 miles, 1hour and 45 mins or thereabouts. Icy roads. #
  • Lets try http://favoriterun.com/299771 #
  • Hotel chocolate is very nice; I lack self control and gorge myself on a packet at a time … And subsequently feel yucky. no common sense. #
  • Joined the weirdos in Bromsgrove by walking a childless push chair. Next up wearing womens clothes?shouting at people? X rd to avoid
    me! #
  • Installing Quickbooks is not fun. Pain. License keys. Pain. Updates etc. Payroll still to do :-/ #
  • Prezzo's Sticky toffee pudding is very nice; shame about the sugar rush afterwards. #
  • Today I did a total of 61 pushups thanks to the Hundred Pushups iPhone app. (Week 1, Day 3, Level 3) #100Pushups #
  • We're trainees and we're making tracks, wheels to the rails… Clackety clack! #chuggington #
  • Virtualbox OSE seems much better and quicker than vmware server. Bye bye vmware, your version 2 web ui will not be missed. #
  • Windows 7 looks pretty similar to vista to me. Wondering what the fuss is about? #
  • #uksnow b61 (bromsgrove) 1/10; 1" on ground. Roads appear ungritted. #timeforachange #
  • Bromsgrove roads appear ungritted. Traffic moving on a38; stourbridge rd worst. #
  • Today I did a total of 58 pushups thanks to the Hundred Pushups iPhone app. (Week 1, Day 2, Level 3) #100Pushups #
  • Grr. Snow. You've outlived your welcome. B61 #uksnow #
  • RT @guardiantech Microsoft Office disappears from virtual shelves as i4i's injunction bites http://bit.ly/5QDPcY #
  • http://www.myconfinedspace.com/2010/01/10/dog-pony #
  • Cold weather is awesome – week old bread hasn't gone mouldy 🙂 #
  • Finally stuck #varnish infront of some #plone / #zope sites. Performance++. Should have done this ages ago. 7req/sec -> 300req/sec etc. #
  • Today I did a total of 48 pushups thanks to the Hundred Pushups iPhone app. (Week 1, Day 1, Level 3) #100Pushups #
  • Interesting talk with asda cashier re muppets panic buying recently (15 loaves of bread anyone?). No
    Fresh milk the only affected thing 2day #

100 press-ups (or push-ups if you’re american)

I thought I’d better exercise more than just my legs for once, and the 100 Pushups challenge caught my eye. In school I used to do 30 press ups each night; over the last few years there have been a few instances when my arms felt weak and puny.

So, seeing as how I’m supposed to make a new years resolution, I thought I’d start on this.

Day 0 : See how many I can do. Wasn’t sure whether I should do them all at once, or stop, breathe and then carry on… or what

Day 1 : Did something like : 10, 12, 7, 7, 12 with a one minute rest in-between them all.

If I’m allowed a 1 minute rest between each set, surely it’s pretty easy to get to 100? A random search on Twitter shows some people going somewhere beyond 100 too. Hmm… Do I stop when I get to 100, or when I get to 6 weeks? It does seem a bit easy if I’m pretty much half way there already (48).

Rowan found it funny watching me bounce up and down on the floor this morning anyway. Think I’ve sorted my breathing out too.

WordPress – I’m impressed with your upgrade procedure

Previously I used Drupal on this website, and it was always a pain to migrate from one version to the next – there were a number of hoops to hump through, things would often break, modules would need reinstalling and then after upgrading you’d find that some random bit of functionality no longer works (e.g. posting comments on a blog entry, or being able to see anything if you were an anonymous user).

So, when I saw WordPress 2.9 was out, I wasn’t overly quick to migrate from 2.8. Unfortunately I couldn’t tell if the 2.8 branch was still being maintained, and then when I noticed 2.9.1 was out, I thought I might as well make the leap (besides, it’s best to avoid .0 releases 🙂 )

In my case, I run WordPress from SVN, as do many other similarly lazy people.

So, firstly to move to the 2.9 branch with Subversion :

svn switch http://svn.automattic.com/wordpress/branches/2.9 htdocs
svn update

Then, I visited my site – wow, it still worked. Logged in as the admin, and had a single button to click (‘Update database’). Milliseconds later, that was done, and it continues to work. That was easy. Where’s the broken stuff? Theme still seems to work, plugins are still working….

Varnish + Zope – Multiple zope instances behind a single varnish cache

I run multiple Zope instances on one server. Each Zope instance listens on a different port (localhost:100xx). Historically I’ve just used Apache as a front end which forwards requests to the Zope instance.

Unfortunately there are periods of the year when one site gets a deluge of requests (for example; when hosting a school site, if it snows overnight, all the parents will check the site in the morning at around about 8am).

Zope is not particularly quick on it’s own – Apache’s “ab” reports that a dual core server with plenty of RAM can manage about 7-14 requests per second – which isn’t that many when you consider each page on a Plone site will have a large number of dependencies (css/js/png’s etc).

Varnish is a reverse HTTP proxy – meaning it sits in-front of the real web server, caching content.

So, as I’m using Debian Lenny….

  1. apt-get install -t lenny-backports varnish
  2. Edit /etc/varnish/default.vcl
  3. Edit Apache virtual hosts to route requests through varnish (rather than directly to Zope)
  4. I didn’t need to change /etc/default/varnish.

In my case there are a number of Zope instances on the same server, but I only wanted to have one instance of varnish running. This is possible – but it requires me to look at the URL requested to determine which Zope instance to route through to.

So, for example, SiteA runs on a Zope instance on localhost:10021/sites/sitea. My original Apache configuration would contain something like :

<IfModule mod_rewrite.c>
   RewriteEngine on
   RewriteRule ^/(.*) http://127.0.0.1:10021/VirtualHostBase/http/www.sitea.com:80/sites/sitea/VirtualHostRoot/$1 [L,P]
 </IfModule>

To use varnish, I’ll firstly need to tell Varnish how to recognise requests for sitea (and other sites), so it can forward a cache miss to the right place, and then reconfigure Apache – so it sends requests into varnish and not directly to Zope.

So, firstly, in Varnish’s configuration (/etc/varnish/default.vcl), we need to define the different backend server’s we want varnish to proxy / cache. In my case they’re on the same server –

backend zope1 {
.host = "127.0.0.1";
.port = "10021";
}
backend zope2 {
.host = "127.0.0.1";
.port = "10022";
}
Then, in the 'sub vcl_recv' section, use logic like :
if ( req.url ~ "/sites/sitea/VirtualHostRoot") {
   set req.backend = zope1;
}
if ( req.url ~ "/siteb/VirtualHostRoot") {
    set req.backend = zope2;
}

With the above in place, I can now just tell Apache to rewrite Sitea to :

RewriteRule ^/(.*) http://127.0.0.1:6081/VirtualHostBase/http/www.sitea.com:80/sites/sitea/VirtualHostRoot/$1 [L,P]

Instead….. and now we’ll find that our site is much quicker 🙂 (This assumes your varnish listens on localhost:6081).

There are a few additional snippets I found – in the vcl_fetch { … } block, I’ve told Varnish to always cache items for 30 seconds, and to also overwrite the default Server header given out by Apache etc, namely :

sub vcl_fetch {

    # ..... <snip> <snip>

    # force minimum ttl for objects

    if (obj.ttl < 30s) {

        set obj.ttl = 30s;

    }

    # ... <snip> <snip>

    unset obj.http.Server;

    set obj.http.Server = "Apache/2 Varnish";

    return (deliver);

}
I'm happy anyway. :)
Use 'varnishlog', 'varnishtop' and 'varnishhist' to monitor varnish.