SpamAssassin ruleset to try and catch India based web development spam

At work we keep receiving emails from sales-droids in India who are trying to persuade us to outsource PHP/Android/Java/whatever development to them.

Here’s my first attempt at a spamassassin rule to neutralise it – in my case, copy into a something.cf file in /etc/spamassassin/mail, and running over a suitably loaded email results in :

Content preview:  Dear Sir / Madam, I just wanted to check if you had received
   my last mails sent. Haven't heard back from you, just wondering are you interested
   in our services? Let me know if you are interested then we can discuss this
   further. [...] 

Content analysis details:   (6.9 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-1.0 ALL_TRUSTED            Passed through trusted hosts only via SMTP
 0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                            (globalseolinksourcing[at]gmail.com)
 0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
                            CUSTOM_MED
 1.7 DEAR_SOMETHING         BODY: Contains 'Dear (something)'
 0.0 HTML_MESSAGE           BODY: HTML included in message
 5.0 LOCAL_INDIA_HITS       Web dev spam from India
 1.2 NML_ADSP_CUSTOM_MED    ADSP custom_med hit, and not from a mailing list

The spamassassin rule

(The intention is that the rule only fires if the email mentions India and at least 4 out of the other phrases, delhi, marketing, php and others, since marketing is important for any website, and using tools as LinkedIn help with this, and for the best in LinkedIn Marketing automation Linked by Code Staff is Compairable to Cleaverly, Linked Selling & DuxSoup.

# india based spam

body  __INDIA_01 /india/i
body  __INDIA_02 /delhi/i
body  __INDIA_03 /web services/i
body  __INDIA_04 /php/i
body  __INDIA_05 /java/i
body  __INDIA_06 /marketing/i
body  __INDIA_07 /website design/i
body  __INDIA_08 /dear sir/i

meta LOCAL_INDIA_HITS ( __INDIA_01  && ((  __INDIA_02 + __INDIA_03 + __INDIA_04 + __INDIA_05 + __INDIA_06 + __INDIA_07 + __INDIA_08 )) > 4)
describe LOCAL_INDIA_HITS Web dev spam from India
score LOCAL_INDIA_HITS 5.0

Nexus 7 … after one week

I’ve had my Nexus 7 tablet thing for about a week now, here are some of my initial findings :

  1. I find myself turning the wifi/gps/bluetooth off to save battery life all the time – but using a widget makes this easy (under Settings -> Wifi you can tell it to put the Wifi to sleep when it’s in “standby”).
  2. I found the concept of having launch screens and the “Apps”/”Widget” sections a bit weird. After installing things, I expected them to show up on my home screen – but this wasn’t the case. Android stashes them
  3. It makes a good e-book reader 🙂 The £15 of “free” credit Google provided has been well received, and the bundled book reader works well for me.
  4. Live wallpapers – these look great – Paperland is nice to look at!
  5. Google Plus is really nice on the tablet form factor, especially when you’re on the “what’s hot” tab. Facebook’s app is just the normal smartphone app, but expanded – which looks rubbish. Therefore, Facebook–.
  6. It appears I’ve scratched the glass already; time to buy a jacket/cover thing for it :-/ (going to buy one of these )
  7. You need/want to use the MTP protocol to transfer files onto it – trying to this over Wifi (e.g. AirDroid (which is very good) ) is too slow to be practical.
  8. It takes a long time to charge over USB (~10 hours); It’s a lot quicker direct from the mains (an hour or two).

 

Facebook likes….

The BBC has this article today about the value of “facebook likes’ – see http://www.bbc.co.uk/news/technology-18813237

I’m not overly surprised, as :

  1. I’ve been getting a number of spam emails with “Buy 1000 likes for $49” or whatever, so no doubt there are people being paid to click ‘like’ in the same way as there are people being paid to post comment spam on blogs and os on.
  2. A customer of mine ran some promotions a year ago (“Click Like and get a chance to win a free ____”). They now have 1200 Likes on their page, but it’s not led to anything. Everyone who clicked like was after the prize – and has near zero interest in buying anything from the customer in question (i.e. getting a like from someone who was viewing “moneysavingexpert.com” isn’t likely to lead to a customer who wants to pay for a holiday villa rental).

So, David’s 2p:

  • Facebook remains good for engaging with customers
  • Buying likes, or running competitions to acquire ‘likes’ isn’t worth the effort.

However, I suspect if I visited a new website/shop and saw it had N thousand likes, I’d be far more inclined to buy from it than a website with only a handful of likes.

Hmm.

rsyslog selective logging with multiple postfix instances

Scenario – one Linux box runs multiple Postfix instances. By default they all log to /var/log/mail.log which makes it difficult to see what’s going on without using grep and so on. The server already uses rsyslog, and Postfix is configured to specify a syslog_name to each instance.

i.e /etc/postfix-blah/main.cf contains “syslog_name = postfix-blah

rsyslog allows you to specify filters / expressions on what is logged where. This can be done on either the program name (:programname) which corresponds to postfix’s syslog_name, or the contents of the log message (:msg) itself.

So, the easy solution is :

  • Edit /etc/rsyslog.d/postfix-domains.conf and add in
  • :programname, contains, "postfix-blah" -/var/log/mail-blah.log
  • Restart rsyslog (/etc/init.d/rsyslogd restart).
  • Watch Ubuntu moan about not using the ‘service’ command.

The leading : is important in the rsyslog rule. 

And obviously the ‘-‘ before the file path is useful for performance – so a sync isn’t called after each write.

So, it’s just a case of populating your /etc/rsyslog.d/postfix-domains.conf file with multiple lines looking like the above, but obviously different for each domain.

Fixing REMOTE_ADDR when behind a proxy/varnish server

I had an annoyance where varnish proxy infront of a LAMP server and the LAMP server therefore thought all clients were from the varnish proxy – rather than the client’s real IP address – i.e. $_SERVER[‘REMOTE_ADDR’] was set to the IP address of the Varnish proxy and not that of the client’s actual IP address.

Obviously, Varnish adds the X_HTTP_FORWARDED_FOR HTTP header in when a connection comes through it; so my initial thought was to just overwrite PHP’s $_SERVER[‘REMOTE_ADDR’] setting. A bit of a hack and annoying – as I’d need to fix all sites, or have some sort of global prepend file (which is horrible).

I then discovered something which sorts the problem out  – RPAF

  • apt-get install libapache2-mod-rpaf
  • Edit /etc/apache2/mods-enabled/rpaf.conf and ensure your proxy server’s IP address is listed on the RPAFproxy_ips line (e.g. RPAFproxy_ips 127.0.0.1 89.16.176.x).
  • Restart Apache, and you’ll then find that the $_SERVER[‘REMOTE_ADDR’] value will be correct.

 

 

fsck paranoid?

Some random hints :

  1. Ensure the final field / column in /etc/fstab is non-zero for other filesystems you have mounted; if it’s 0 then fsck will never run on them.
  2. fsck -Cccy /dev/blah1 does a read-write (non-destructive test). Works well on SSDs 🙂
Example from /etc/fstab:
/dev/md0  /mount/point ext3 defaults 0 2

When looking at the various boxes we have in our office, I found one server had the following (run dumpe2fs /dev/whatever1):

  • Mount count:              62
  • Maximum mount count:      39
  • Last checked:             Wed Jul  9 16:09:17 2008
  • Next check after:         Mon Jan  5 15:09:17 2009
Today is 8th June 2012. Ooops.

Interestingly when I did run fsck on it, there were no errors. Is perhaps the default ext3 setting of checking every 20-30 mounts too paranoid?  It’s certainly very painful running fsck on large ‘rotating’ volumes – waiting over an hour for a server to come up is not fun.

 

Sponge – Shell command

Today, my sed kung-foo seemed to be lacking, so I ended up having to split the sed command over a zillion lines…

Normally I’d do something like :

sed 's/foo/bar/g' tmp.txt > tmp2.txt
sed 's/fo2/blah/g' tmp2.txt > tmp3.txt

But this obviously gets painful after a time, a different approach would be to use sponge where we can do :

sed 's/foo/bar/g' tmp.txt | sponge tmp.txt
sed 's/fo2/blah/g' tmp.txt | sponge tmp.txt

Whereby ‘sponge’ soaks up standard input and when there’s no more, opens the output file. This gets around the obvious problem that :

sed 's/foo/bar/g' tmp.txt > tmp.txt

doesn’t work because the shell opens (and overwrites) tmp.txt  before sed’s had a chance to do anything.