PostgreSQL unbuffered queries and PHP (cursors)

From using MySQL, I’ve used the ‘unbuffered queries‘ feature a number of times. It’s where you don’t fetch the entire resultset into memory at once – which is necessary if you’re retrieving more data than you have memory available. If’s often also generally gets results/data back to you sooner.
Continue reading “PostgreSQL unbuffered queries and PHP (cursors)”

vagrant commands for the lazy and forgetful

Vagrant commands :

  • vagrant up  – aka “clocking on at work, time to boot up the VM”
  • vagrant status – aka “did I leave that VM running last night?”
  • vagrant halt – aka “time to go home or change project”
  • vagrant destroy – aka “VM is broken; kill it with fire and deploy a virgin VM to start again”
  • vagrant package default –output /path/to/output/file.box – aka “I like this VM, I think I’ll keep a copy around somewhere for future purposes”
  • vagrant init – aka “Get this project started …”

 

Obviously requires you have a reasonably good ‘Vagrantfile’ to start with…..

Checking PHP code for compatibility issues

One project I occassionally hack on is Xerte Toolkits.

Yesterday on the mailing list it came up that someone was trying to use XOT with PHP4.

After getting over some initial shock that people still use PHP4 (it was end-of-lifed in August 2008) I wondered how easy it would be to check the status of a code base to find how incompatible with PHP4 it now is.

My initial thought was to find a list of functions which had been added with PHP5 and then just grep the code for them, but it turns out there is a much nicer approach – PHP_CompatInfo

Installation was fairly straight forward – like :

pear channel-discover bartlett.laurent-laville.org
pear install bartlett/PHP_CompatInfo

Annoyingly the documentation seemed well hidden – but once I found it (http://php5.laurent-laville.org/compatinfo/manual/2.3/en/index.html#_documentation) it was pretty easy to use, and the ‘phpci’ command did all I needed –

Examples :

1. List global variables in use :

$ phpci print --reference PHP5 --report global -R . 
436 / 436 [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 100.00%
BASE: /home/david/src/XOT/trunk

-------------------------------------------------------------------------------
PHP COMPAT INFO GLOBAL SUMMARY
-------------------------------------------------------------------------------
  GLOBAL                                                  VERSION         COUNT
-------------------------------------------------------------------------------
                                        $_GET             4.1.0               1
  data                                  $_GET             4.1.0               2
  debug                                 $_GET             4.1.0               2
  export                                $_GET             4.1.0               2
  file                                  $_GET             4.1.0               1
  firstname                             $_GET             4.1.0               1
....

2. Find all PHP5 functions in use :

$ phpci print  --report function -R . | grep 5.
436 / 436 [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 100.00%
  spl_autoload_register                 SPL               5.1.2               1
  simplexml_load_file                   SimpleXML         5.0.0               1
  iconv_set_encoding                    iconv             4.0.5               1
  iconv_strlen                          iconv             5.0.0              10
  iconv_strpos                          iconv             5.0.0              38
  iconv_strrpos                         iconv             5.0.0               3
  iconv_substr                          iconv             5.0.0              33
  dirname                               standard          4.0.0              53
  fclose                                standard          4.0.0              51
  file_put_contents                     standard          5.0.0               6
  fopen                                 standard          4.0.0              55
  fread                                 standard          4.0.0              57
  fwrite                                standard          4.0.0              50
  htmlentities                          standard          5.2.3               1
  md5                                   standard          4.0.0               1
  scandir                               standard          5.0.0               1
  str_split                             standard          5.0.0               3
REQUIRED PHP 5.2.3 (MIN) 
Time: 0 seconds, Memory: 28.25Mb

and finally,

3. Class usage :

$ phpci print  --report class -R . | grep 5.
436 / 436 [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 100.00%
  Exception                             SPL               5.1.0               1
  InvalidArgumentException              SPL               5.1.0               2
REQUIRED PHP 5.1.0 (MIN)

4. All class usage :

i.e. without me grep’ping the results.

$ phpci print  --report class -R . 
436 / 436 [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 100.00%
BASE: /home/david/src/XOT/trunk

-------------------------------------------------------------------------------
PHP COMPAT INFO CLASS SUMMARY
-------------------------------------------------------------------------------
  CLASS                                 EXTENSION         VERSION         COUNT
-------------------------------------------------------------------------------
  Exception                             SPL               5.1.0               1
  InvalidArgumentException              SPL               5.1.0               2
  PHP_CompatInfo                                          4.0.0               1
  Snoopy                                                  4.0.0               2
  StdClass                                                4.0.0               2
  Xerte_Authentication_Abstract                           4.0.0               6
  Xerte_Authentication_Factory                            4.0.0               4
  Xerte_Authentication_Guest                              4.0.0               1
  Xerte_Authentication_Ldap                               4.0.0               1
  Xerte_Authentication_Moodle                             4.0.0               1
  Xerte_Authentication_Static                             4.0.0               1
  Xerte_Authetication_Db                                  4.0.0               1
  Zend_Exception                                          4.0.0               2
  Zend_Locale                                             4.0.0               7
  Zend_Locale_Data                                        4.0.0              19
  Zend_Locale_Data_Translation                            4.0.0               6
  Zend_Locale_Exception                                   4.0.0              28
  Zend_Locale_Format                                      4.0.0               3
  Zend_Locale_Math                                        4.0.0              14
  Zend_Locale_Math_Exception                              4.0.0               9
  Zend_Locale_Math_PhpMath                                4.0.0              11
  archive                                                 4.0.0               3
  bzip_file                                               4.0.0               1
  dUnzip2                                                 4.0.0               3
  gzip_file                                               4.0.0               1
  tar_file                                                4.0.0               3
  toolkits_session_handler                                4.0.0               1
  zip_file                                                4.0.0               2
-------------------------------------------------------------------------------
A TOTAL OF 28 CLASS(S) WERE FOUND
REQUIRED PHP 5.1.0 (MIN) 
-------------------------------------------------------------------------------
Time: 0 seconds, Memory: 27.50Mb
-------------------------------------------------------------------------------

Which answers my question(s) and so on.

Netbeans vs Vim … round 1 … fight

So, I think I’ve changed ‘editor’. Perhaps this is a bit like an engineer changing their calculator or something.

For the last 10 years, I’ve effectively only used ‘vim‘ for development of any PHP code I work on.

I felt I was best served using something like vim – where the interface was uncluttered, everything was a keypress away and I could literally fill my entire monitor with code. This was great if my day consisted of writing new code.

Unfortunately, this has rarely been the case for the last few years. I’ve increasingly found myself dipping in and out of projects – or needing to navigate through a complex set of dependencies to find methods/definitions/functions – thanks to  the likes of PSR0. Suffice to say, Vim doesn’t really help me do this.

Perhaps, I’ve finally learnt that ‘raw’ typing speed is not the only measure of productivity – navigation through the codebase, viewing inline documentation or having a debugger at my fingertips is also important.

So, last week, while working on one project, I eventually got fed up of juggling between terminals and fighting with tab completion that I re-installed netbeans – so, while I’m sure vim can probably do anything netbeans can – if you have the right plugin installed and super flexible fingers.

So, what have I gained/lost :

  • a 10 second slow down – that’s how long it takes to start Netbeans – even with an SSD and a desktop that should be quick enough (perhaps this is justification for a faster desktop?). Lesson learnt – do not close Netbeans.
  • indexing of the code – although this takes some time when importing an existing project – it does make it very quick to ‘find uses’ of function/class X
  • the ability to easily jump to a function/class definition (right click, find uses …)
  • a list of ‘todo’ items – from my code comments … this has led to me removing some old legacy cruft…
  • the ‘refactor’ functionality – i.e. rename function old_name() and change all instances of it to new_name()
  • code indentation seems to work better under netbeans
  • inline ‘warnings’ (i.e. unused variable, previously undefined variable[x], syntax errors etc) – so I don’t have to wait for them to appear in a browser or elsewhere.
  • inline documentation etc while typing (i.e. type in in_array( and it reminds me where the needle and haystack are)
  • with the right them, a nicer editor Window (mine is based on BrandonDark)

x – Fails with global variables on legacy projects though – in that netbeans doesn’t realise the variable has been sucked in through a earlier ‘require’ call.

I did briefly look at sublime a few weeks ago, but couldn’t see what the fuss was about – it didn’t seem to do very much – apart from have multiple tabs open for the various files I was editing.

SQL Injection with added magic_quotes assistance (the joys of legacy code maintenance)

 

Sometimes you really have to laugh (or shoot yourself) when you come across legacy code / the mess some other developer(s) left behind. (Names slightly changed to protect the innocent)

class RocketShip {

    function rahrah() {
        $sql = "insert into foo (rah,rahrah,...) 
            values ( '" . $this->escape_str($this->meh) . "', ...... )";
        mysqli_query($this->db_link, $sql) or 
            die("ERROR: " . mysqli_error($this->db_link));
        $this->id = mysqli_insert_id($this->db_link);
    }

    function escape_str($str)
    {
        if(get_magic_quotes_gpc())
           { $str = stripslashes($str);}
        //echo $str;
        //$clean = mysqli_real_escape_string($this->db_link,$str);
        //echo $clean;
       return $str;
    }
// ....
    function something_else() {
         mysqli_query($this->db_link, 
            sprintf("insert into fish(field1,field2) values('%s', '%s')", 
            $this->escape_str($this->field1), 
            $this->escape_str($this->field2));

    }
}

You’ve got to just love the :

  1. Lack of Error handling / logging.
  2. Functionality of the escape_str function which is only making matters worse (and could never have worked due to the variable names)
  3. Use of sprintf  and %s ….(obviously %d could be useful)
  4. Documentation?

Dare I uncomment the mysqi_real_escape_string and fix escape_str’s behaviour?

In other news, see this tweet – 84% of web apps are insecure; that’s a bit damning. But perhaps not surprising given code has a far longer lifespan than you expect….

 

Zend_Cache – automatic cache cleaning can be bad, mmkay?

$customer uses Zend_Cache in their codebase – and I noticed that every so often a page request would take ~5 seeconds (for no apparent reason), while normally they take < 1 second …

Some rummaging and profiling with xdebug showed that some requests looked like :

xdebug profiling output - note lots of zend_cache stuff
xdebug profiling output - note lots of zend_cache stuff

Note how there are 25,000 or so calls for various Zend_Cache_Backend_File thingys (fetch meta data, load contents, flock etc etc).

This alternative rendering might make it more clear – especially when compared with the image afterwards :

zend cache dominating the call map
zend cache dominating the call map

while a normal request should look more like :

a "normal" request - note Zend_Cache is not dominating
a "normal" request - note Zend_Cache is not dominating

Zend_Cache has a ‘automatic_cleaning_mode’ frontend parameter – which is by default set to 10 (i.e. 10% of all write requests to the cache result in it checking if there is anything to garbage collect/clean). Since we’re nearly always writing something to the cache, this results in 10% of requests triggering the cleaning logic.

See http://framework.zend.com/manual/en/zend.cache.frontends.html.

 

The cleaning is now run via a cron job something like :

$cache_instance->clean(Zend_Cache::CLEANING_MODE_OLD);

 

PHP Serialization & igbinary

Recently I’ve been trying to cache more and more stuff – mostly to speed things up. All was well, while I was storing relatively small numbers of data – because (as you’ll see below) my approach was a little flawed.

Random background – I use Zend_Cache, in a sort of wrapped up local ‘Cache’ object, because I’m lazy. This uses Zend_Cache_Backend_File for storage of data, and makes sure e.g. different sites (dev/demo/live) have their own unique storage location – and also that nothing goes wrong if e.g. a maintenance script is run by a different user account.

My naive approach was to do e.g.

$cached_data = $cache->load('lots_of_stuff');
if(!empty($cached_data)) {
   if(isset($cached_data[$key])) {
       return $value;
   }
}
else {
    // calculate $value
    $cached_data[$key] = $value;
    $cache->save($cached_data, $cache_key);
}
return $value;

The big problem with this is that the $cached_data array tends to grow quite large; and PHP spends too long unserializing/serializing. The easy solution for that is to use more than one cache key. Problem mostly solved.

However, if the site is performing a few thousand calculations, speed of [de]serialisation is still gong to be an issue – even if the data involved is in small packets. I’d already profiled the code with xdebug/kcachegrind and could see PHP was spending a significant amount of time performing serialisation – and then remembered a presentation I’d seen (http://ilia.ws/files/zendcon_2010_hidden_features.pdf – see slides 14/15/16 I think) at PHPBarcelona covering Igbinary (https://github.com/phadej/igbinary)

Once you install the extension –

phpize
./configure
make
cp igbinary.so /usr/lib/somewhere
#add .ini file to /etc/php5/conf.d/

You’ll have access to igbinary_serialize() and igbinary_unserialize() (I think ‘make install’ failed for me, hence the manual cp etc).

I did a random performance test based on this and it seems to be somewhat quicker than other options (json_encode/serialize) – this was using PHP 5.3.5 on a 64bit platform. Each approach used the same data structure (a somewhat nested array); the important things to realise are that igbinary is quickest and uses less disk space.

JSON (json_encode/json_decode):

  • JSON encoded in 2.18 seconds
  • JSON decoded in 9.83 seconds
  • serialized “String” size : 13993

Native PHP :

  • PHP serialized in 2.91 seconds
  • PHP unserialized in 6.43 seconds
  • serialized “String” size : 20769

Igbinary :

  • WIN igbinary serialized in 1.60 seconds
  • WIN igbinrary unserialized in 4.77 seconds
  • WIN serialized “String” Size : 4467

The performance testing bit is related to this Stackoverflow comment I made on what seemed a related post

Javascript Linting…

Suffice to say, my minions write a quantity of Javascript. And testing it isn’t all that easy. While rummaging the internet, I came across @NeilCrosby‘s FrontEndTestSuite which aims to automate e.g. w3c validator checks and so on – there will be more on that later I suspect once I get it working.

Anyway, the first part I wanted to do is to run a Javascript linter on things…. as I haven’t really come across these before.

I’ve found two approaches :

Install JavascriptLint

see http://www.javascriptlint.com/download.htm

Build instructions for the lazy – uncompress/extract the files from the archive, then :

  1. cd jsl-0.3.0/src
  2. make -f Makefile.ref
  3. cp Linux_All_DBG.OBJ/jsl /usr/local/bin

Usage looks a bit like :

jsl -process $file

As per the ‘help’ documentation, it returns different exit codes depending how things went (e.g. 0 – everything good; 1 – warnings etc).

Aside from the annoying compile step, this seemed the easiest to setup, and friendliest to use from the command line.

Mozilla’s Rhino & JSLint.js

This is the approach expected by Neil’s TestSuite above (more soon, perhaps).

Download the Mozilla Rhino thing – for me this is a simple ‘apt-get install rhino‘ YMMV.

  1. export CLASSPATH=/usr/share/java/js.jar
  2. java org.mozilla.javascript.tools.shell.Main /path/to/some/javascript.js

Again, this will give some sort of return error code if it can’t parse it – but it’s not yet running through jslint… which is what we really want.

Firstly, download JSLint.js via https://github.com/douglascrockford/JSLint/blob/master/fulljslint.js (click on the ‘raw’ button)…

  1. (Requires CLASSPATH thing from above)
  2. java org.mozilla.javascript.tools.shell.Main
  3. load(‘fulljslint.js’);
  4. to_test = readFile(‘/path/to/javascript/file/to/test.js’);
  5. result = JSLINT(to_test, null);

If ‘result’ is ‘false’ then you can inspect the errors via JSLINT.errors.

Next up, getting frontend-test-suite running, or something based upon it….

Late to the performance party

Everyone else probably already knows this, but $project is/was doing two queries on the MySQL database every time the end user typed in something to search on

  1. to get the data between a set range (SELECT x,y….. LIMIT n, OFFSET m or whatever) and
  2. another to get the total count of records (SELECT count(field) ….).

This is all very good, until there is sufficiently different logic in each query that when I deliberately set the offset in query #1 to 0 and limit very high and find that the of rows returned by both doesn’t match (this leads to broken paging for example)

Then I thought – surely everyone else doesn’t do a count query and then repeat it for the range of data they want back – there must be a better way… mustn’t there?

At which point I found:
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
and
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows

See also the comment at the bottom of http://php.net/manual/en/pdostatement.rowcount.php which gives a good enough example (Search for SQL_CALC_FOUND_ROWS)

A few modifications later, run unit tests… they all pass…. all good.

I also found some interesting code like :

$total = sizeof($blah);
if($total == 0) { … }
elseif ($total != 0) { …. }
elseif ($something) { // WTF? }
else { // WTF? }

(The WTF comment were added by me… and I did check that I wasn’t just stupidly tired and not understanding what was going on).

The joys of software maintenance.

Logging … and how not to do it.

Grumpy man, back from battling with some legacy code, has a rant.

One thing that really annoys me is when I come to look at the log file and I see something like :

blah blah did blah blah
blah foo blah random comment
fish blah some data
which spans many lines or does it?

This is bad, as I’ve got absolutely no idea where the messages are from (so have to grep around a code base), and I’ve no idea WHEN they were made. At best I can look at timestamps on this file and figure out a timeframe (assuming logrotate is in use so there is a definite (must be after X timestamp)).

What’s far better from a maintenance point of view :

2010/07/29 09:33 filewhatever.py:355 blah blah blah did blah blah

2010/07/29 09:34 filewhatever.py:355 blah blah blah did blah blah

2010/07/29 09:35 filewhatever.py:355 data received from x is {{{hello world…. }}}

Changes are :

  1. Date and time stamps (in python: datetime.datetime.now())
  2. Recording where the message came from (see the ‘inspect’ python module – inspect.stack()[1][1] for calling file, and inspect.stack()[1][2] for the line number, or debug_backtrace() in PHP)
  3. Wrapping any interesting output (e.g. from a remote service) in obvious delimiters (e.g. {{{ and }}} )  – without e.g. timestamps or some other common line prefix, I’ve no way of knowing what’s from where, especially if the output spreads over many lines.

Other good ideas :

  1. Different severities of log message (classic: debug, info, error type annotation with appropriate filtering).
  2. Make sure logrotate is in use, or a simple shell script via cron, to stop the log file growing too large and causing problems.
  3. Stop writing your own logging mechanisms and use ones provided by the system (e.g. Python has a logger built in which does all of the above and more)

EOR – EndOfRant