Netbeans vs Vim … round 1 … fight

So, I think I’ve changed ‘editor’. Perhaps this is a bit like an engineer changing their calculator or something.

For the last 10 years, I’ve effectively only used ‘vim‘ for development of any PHP code I work on.

I felt I was best served using something like vim – where the interface was uncluttered, everything was a keypress away and I could literally fill my entire monitor with code. This was great if my day consisted of writing new code.

Unfortunately, this has rarely been the case for the last few years. I’ve increasingly found myself dipping in and out of projects – or needing to navigate through a complex set of dependencies to find methods/definitions/functions – thanks to  the likes of PSR0. Suffice to say, Vim doesn’t really help me do this.

Perhaps, I’ve finally learnt that ‘raw’ typing speed is not the only measure of productivity – navigation through the codebase, viewing inline documentation or having a debugger at my fingertips is also important.

So, last week, while working on one project, I eventually got fed up of juggling between terminals and fighting with tab completion that I re-installed netbeans – so, while I’m sure vim can probably do anything netbeans can – if you have the right plugin installed and super flexible fingers.

So, what have I gained/lost :

  • a 10 second slow down – that’s how long it takes to start Netbeans – even with an SSD and a desktop that should be quick enough (perhaps this is justification for a faster desktop?). Lesson learnt – do not close Netbeans.
  • indexing of the code – although this takes some time when importing an existing project – it does make it very quick to ‘find uses’ of function/class X
  • the ability to easily jump to a function/class definition (right click, find uses …)
  • a list of ‘todo’ items – from my code comments … this has led to me removing some old legacy cruft…
  • the ‘refactor’ functionality – i.e. rename function old_name() and change all instances of it to new_name()
  • code indentation seems to work better under netbeans
  • inline ‘warnings’ (i.e. unused variable, previously undefined variable[x], syntax errors etc) – so I don’t have to wait for them to appear in a browser or elsewhere.
  • inline documentation etc while typing (i.e. type in in_array( and it reminds me where the needle and haystack are)
  • with the right them, a nicer editor Window (mine is based on BrandonDark)

x – Fails with global variables on legacy projects though – in that netbeans doesn’t realise the variable has been sucked in through a earlier ‘require’ call.

I did briefly look at sublime a few weeks ago, but couldn’t see what the fuss was about – it didn’t seem to do very much – apart from have multiple tabs open for the various files I was editing.

A week of fire fighting (aka why you should <3 unit tests).

I feel like I’ve spent most of this week debugging some PHP code, and writing unit tests for it. Thankfully I think this firefighting exercise is nearly over.

Perhaps the alarm bells should have been going off a bit more in my head when it was implied that the code was being written in a quick-and-dirty manner (“as long as it works….”) and the customer kept adding in additional requirements – to the extent it is no longer clear where the end of the project is.

“It’s really simple – you just need to get a price back from two APIs”

then they added in :

“Some customers need to see the cheapest, some should only see some specific providers …”

and then :

“We want a global markup to be applied to all quotes”

and then :

“We also want a per customer markup”

and so on….

And the customer didn’t provide us with any verified test data (i.e. for a quote consisting of X it should cost A+B+C+D=Y).

The end result is an application which talks to two remote APIs to retrieve quotes. Users are asked at least 6 questions per quote (so there are a significant number of variations).

Experience made me slightly paranoid about editing the code base – I was worried I’d fix a bug in one pathway only to break another. On top of which, I initially didn’t really have any idea of whether it was broken or not – because I didn’t know what the correct (£) answers were.

Anyway, to start with, it was a bit like :

  • Deploy through some weird copy+pasting manner due to Windows file permissions
  • No unit test coverage
  • No logging
  • Apparently finished
  • Apparently working (but the customer kept asking for more changes)

Now:

  • Deployment through git; Stupid Windows file permissions fixed.
  • Merged in changes by third party graphics designer – should be much easier to track any changes he makes in the future
  • ~80% test code coverage. I’ve had to modify the ‘real’ scripts to accept a new parameter, which would make them read a local XML file (rather than the remote API/service) – in order for the tests to be reproducible (and quick to run)
  • Logging in place, so there’s some traceability
  • Better error handing
  • Calculations manually checked and confirmed.

Interesting things I didn’t like about the code :

  • Premature attempt at dependency injection – the few functions there are/were have a $db variable being passed in – but often don’t use it.
  • There is significant duplication in the code base still.
  • The code didn’t (and still doesn’t really) support testing particularly well – I’m having to retrieve output through a Zend_Http_Client call (i.e. mimicking a browser) – which means I can’t get code test coverage stats easily.
  • In some places a variable was being set to 0 (e.g. $speed) which would be used to determine a special case (if $speed == 0). Having multiple meanings in/on the same variable makes it difficult to follow what’s going on – and is a pain when the customer requests it behaves a bit differently. Really a separate parameter should have been used.

Zend_Cache – automatic cache cleaning can be bad, mmkay?

$customer uses Zend_Cache in their codebase – and I noticed that every so often a page request would take ~5 seeconds (for no apparent reason), while normally they take < 1 second …

Some rummaging and profiling with xdebug showed that some requests looked like :

xdebug profiling output - note lots of zend_cache stuff
xdebug profiling output - note lots of zend_cache stuff

Note how there are 25,000 or so calls for various Zend_Cache_Backend_File thingys (fetch meta data, load contents, flock etc etc).

This alternative rendering might make it more clear – especially when compared with the image afterwards :

zend cache dominating the call map
zend cache dominating the call map

while a normal request should look more like :

a "normal" request - note Zend_Cache is not dominating
a "normal" request - note Zend_Cache is not dominating

Zend_Cache has a ‘automatic_cleaning_mode’ frontend parameter – which is by default set to 10 (i.e. 10% of all write requests to the cache result in it checking if there is anything to garbage collect/clean). Since we’re nearly always writing something to the cache, this results in 10% of requests triggering the cleaning logic.

See http://framework.zend.com/manual/en/zend.cache.frontends.html.

 

The cleaning is now run via a cron job something like :

$cache_instance->clean(Zend_Cache::CLEANING_MODE_OLD);

 

PHP Serialization & igbinary

Recently I’ve been trying to cache more and more stuff – mostly to speed things up. All was well, while I was storing relatively small numbers of data – because (as you’ll see below) my approach was a little flawed.

Random background – I use Zend_Cache, in a sort of wrapped up local ‘Cache’ object, because I’m lazy. This uses Zend_Cache_Backend_File for storage of data, and makes sure e.g. different sites (dev/demo/live) have their own unique storage location – and also that nothing goes wrong if e.g. a maintenance script is run by a different user account.

My naive approach was to do e.g.

$cached_data = $cache->load('lots_of_stuff');
if(!empty($cached_data)) {
   if(isset($cached_data[$key])) {
       return $value;
   }
}
else {
    // calculate $value
    $cached_data[$key] = $value;
    $cache->save($cached_data, $cache_key);
}
return $value;

The big problem with this is that the $cached_data array tends to grow quite large; and PHP spends too long unserializing/serializing. The easy solution for that is to use more than one cache key. Problem mostly solved.

However, if the site is performing a few thousand calculations, speed of [de]serialisation is still gong to be an issue – even if the data involved is in small packets. I’d already profiled the code with xdebug/kcachegrind and could see PHP was spending a significant amount of time performing serialisation – and then remembered a presentation I’d seen (http://ilia.ws/files/zendcon_2010_hidden_features.pdf – see slides 14/15/16 I think) at PHPBarcelona covering Igbinary (https://github.com/phadej/igbinary)

Once you install the extension –

phpize
./configure
make
cp igbinary.so /usr/lib/somewhere
#add .ini file to /etc/php5/conf.d/

You’ll have access to igbinary_serialize() and igbinary_unserialize() (I think ‘make install’ failed for me, hence the manual cp etc).

I did a random performance test based on this and it seems to be somewhat quicker than other options (json_encode/serialize) – this was using PHP 5.3.5 on a 64bit platform. Each approach used the same data structure (a somewhat nested array); the important things to realise are that igbinary is quickest and uses less disk space.

JSON (json_encode/json_decode):

  • JSON encoded in 2.18 seconds
  • JSON decoded in 9.83 seconds
  • serialized “String” size : 13993

Native PHP :

  • PHP serialized in 2.91 seconds
  • PHP unserialized in 6.43 seconds
  • serialized “String” size : 20769

Igbinary :

  • WIN igbinary serialized in 1.60 seconds
  • WIN igbinrary unserialized in 4.77 seconds
  • WIN serialized “String” Size : 4467

The performance testing bit is related to this Stackoverflow comment I made on what seemed a related post

Late to the performance party

Everyone else probably already knows this, but $project is/was doing two queries on the MySQL database every time the end user typed in something to search on

  1. to get the data between a set range (SELECT x,y….. LIMIT n, OFFSET m or whatever) and
  2. another to get the total count of records (SELECT count(field) ….).

This is all very good, until there is sufficiently different logic in each query that when I deliberately set the offset in query #1 to 0 and limit very high and find that the of rows returned by both doesn’t match (this leads to broken paging for example)

Then I thought – surely everyone else doesn’t do a count query and then repeat it for the range of data they want back – there must be a better way… mustn’t there?

At which point I found:
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
and
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows

See also the comment at the bottom of http://php.net/manual/en/pdostatement.rowcount.php which gives a good enough example (Search for SQL_CALC_FOUND_ROWS)

A few modifications later, run unit tests… they all pass…. all good.

I also found some interesting code like :

$total = sizeof($blah);
if($total == 0) { … }
elseif ($total != 0) { …. }
elseif ($something) { // WTF? }
else { // WTF? }

(The WTF comment were added by me… and I did check that I wasn’t just stupidly tired and not understanding what was going on).

The joys of software maintenance.

Still looking for a PHP contractor….

At work I’m still looking for a short term PHP contractor. Perhaps I’m being unrealistic in my expectations/requirements (rate/location/duration/skills etc), but nevertheless…. As I’ve not found anyone via normal channels (twitter/phpwm user group etc) I thought I’d turn to a random recruitment agency (who I’d spoken to a week or so ago).

Yesterday I interviewed one guy – who’d been a programmer for a number of years (10+) – using Visual Foxpro (whatever that is) – presumably it’s a dead language, as he wants to move across into PHP. He has very basic PHP experience (yet claims 2 years on his CV), figured out how to do FizzBuzz and Recursion without too much help – but didn’t know anything about object orientation, separation of concerns (specifically MVC), security (obvious SQL injection) or unit testing and failed to make any comment on what is almost the worst code I could find to present to him. This isn’t necessarily a problem – I would normally be happy to train someone – however, not when I’m paying him £25/hour and I’d be lucky if he was productive within a week. (Hint: students are better than this when they’ve only been in University for two years).

Today, I therefore continued hunting, with mixed success. I had three more CVs – all asking for more money, and one looked quite good – but had a requirement he worked remotely after the first few days (well he does live in Telford). Another, who is local, I’m interviewing tomorrow. Wanting to do some homework on him, I had a look at a couple of websites mentioned in his doctored CV  – the first is clearly .Net from the error message it throws when you pass a > into it’s search box – so either they replaced his PHP site quickly or his CV is misleading. The second has a PHP error on it – and is only (effectively) a themed wordpress site which looks like it’s slowly rotting. From these I found out his address (hint: whois $flamingodomain) and an invalid email address/domain (which archive.org seems to not do much with). Typing in his name into Google / LinkedIn, Facebook etc produces no obvious matches. So I know hardly anything about him, and for all intent he may as well not exist. Great sales job there.

From talking to the recruiters it seems it’s difficult to find decent PHP programmers – and anyone who may be decent will almost certainly not be programming PHP as their primary language (i.e. they’ll be doing web development in Java/.Net, and know PHP quite well). This seems a shame, but really only confirms what I already knew from interacting with others in the community. I’ve known for ages that I’ve effectively taken a large pay cut by running my own company, and doing PHP. It sucks that this continues to be the case. Clearly I’m a martyr or something.

So, if you happen to be a contractor looking for work, please make an effort. I’m not overly impressed so far, and may just end up stalling customers for another week/month instead.

(Oddly I wrote this post, posted it, and it vanished. What are you up to wordpress? Why do you want me to retype things in twice?)

Random PHP project progress

Random php development musing

Initially when we founded Pale Purple all our new PHP development used a combination of Propel, Smarty and some inhouse glue. Over time we seem to have drifted towards the Zend Framework, but I’ve never been particularly happy with Zend_Db or Zend_View. Why the Zend Framework? Well, it has loads of useful components (Cache, Form, Routing, Mail etc) and it’s near enough an industry standard from what we see – and obviously I’d rather build on the shoulders of others than spend time developing an in-house framework no one else will ever use.

For one customer, we’re currently working on the next iteration of their code base – which incorporates a number of significant changes. When we inherited the code base from the previous developers we spent a long time patching various SQL Injection holes (casting to ints), moving over to use PDO’s prepared statements and trying to keep on top of the customer’s new functionality requests and support issues. There’s still a lot of horrible code to refactor, plenty of security holes (although none public facing) and we know we’re moving in the right direction – hopefully patching and duct tape will soon be a thing of the past as it will develop some form of architecture and look like someone has thought about design and long term maintenance.

I’ve started to properly do Test First Development – at least from a support perspective – as too often we’d find we would patch a bug, only for it to reappear again in a few weeks/months time. This has been especially useful with the SOAP interface the application exposes. The tests run every 5 minutes, and we all get emailed when stuff breaks – it took all of 30 minutes to setup and put in place – then it was just a case of actually writing the unit tests themselves (the tests take minutes to write; finding/fixing any bugs they pin point takes somewhat longer :-/ ). I’ve also abused Simpletest’s web testing ‘stuff’ to also act as an availability checker of the live site (i.e. hit a few remote URLs, and check that we don’t get error messages back and do see expected strings).

The original code base had no ‘model’ like layer (or MVC ‘compliance’) – files containing HTML, CSS, SQL, Javascript and PHP were the norm – we’ve added Propel to the project as the ‘model’ layer – which took a few hours; and then when reverse engineering the database we found a few oddities (tables without primary keys and so on) – anyway, moving the functionality from a handful of legacy objects across into the Propel ones seems to be well underway, and I for one will be glad to see the end of :

$x = new Foo(5);

Accompanied with code that does the equivalent of :

class Foo {
    public function __construct($id = false) {
        if($id != false) {
            // select * from foo where id = 5
            // populate $this; don't bother checking for the edge case where $id isn't valid
       }
       else {
           // insert into foo values ('');
          // populate $this->id; leaving all other fields as empty strings...
     }
     public function setBaz($nv) { // repeat for all table fields
         $this->baz = $nv;
         global $db;
         $db->query('update foo set baz = "' . $nv . '" where id = ' . $this->id);
     }
}

Finally, we have a meaningful directory structure – where some things aren’t exposed in the document root. Hopefully soon a front controller and some decent routing. At the moment a huge amount of code is just sat in the ‘public’ directory due to it’s nature. We hope to fix this in time, and move to using Zend Controller – once we get Smarty integrated with it.

Propel has added some nice new features since we last used it (effectively v1.2); it was a toss up between it and Doctrine (as obviously the ZF is moving in that direction) – but we already had knowledge/experience with Propel and it seemed the easier option.

I’m hoping that with time we’ll be able to get up to at least 60% test coverage of the code base – at that point we should be able to refactor the code far easier and with less fear. At the moment I doubt the unit tests cover more than 5-10% – although maybe it’s time I pointed xdebug or whatever at it to generate some meaningful stats.

My final task is to get some decent performance measurements out of the code base – so we can track any performance regressions. I’m fairly confident that moving to Propel will result in an speedup as duplicate object hydrations will be eliminated thanks to it’s instance pool, however having hard figures and nice graphs to point at would be ideal. So far I’ve knocked up my own script around ‘ab’ which stores some figures to a .csv file and uses ezComponents to generate a graph file. This seems to be a crap solution, but I can’t think or find anything better. Any suggestions dear Internet? Perhaps I should integrate changeset/revision id’s in my benchmarking too. Suggestions here would be exceedingly appreciated.

There, I should have ticked all necessary boxes wrt development practices now. Now to work on finding a contract PHP developer….

PHP array and object addition and string indexing

While reading someone’s code, I came across the following sort of thing:


function foo($config = array()) {

$this->_config += $config;
//...
}

To which I thought WTF? How does PHP cast an array to a number to perform addition?

A few random tests later, it appears PHP joins the two arrays together, only adding indexes that appear in the second (and not first) array.

(as per php manual – array operators)

Namely :

$array_1 = array(‘a’,’b’, 4=>’c’);
$array_2 = array(4=>’e’, ‘f’, ‘g’);

$array_1 += $array_2;

print_r($array_1);

will give :
Array
(
[0] => a
[1] => b
[4] => c
[5] => f
[6] => g
)

(Note; 4=>e is not in the resultant array, because 4=>c was in array_1).

After a round about conversation with Moobert on this on IRC – at which point he probably put me out of my misery, he also gave me the following :

$x = ‘test’;
echo $x[‘whatever’]; // outputs ‘t’

Which I can understand – as PHP allows for per-character access to a string (based on position) – hence ‘something’ gets casted to zero, and we get the first character back.

I know I can cast objects to arrays, and vice versa.

Seemingly Objects don’t follow any sort of ‘union’ rule however… as adding two objects results in ‘2’. Not sure how PHP converts an object into ‘1’… but there we are. I sort of expected to get an object back (assuming the inputs were the same type) with a union of properties, but no.