I’ve gone ‘back’ to doing hands-on development with small a agile and small k kanban small p post-its . One of the projects I’m on is a fairly large Drupal project. While PHP and Drupal are not known for their raw performance, they do shine in delivering value quickly. Out of the box they are very easy to deploy, and if you know where to find the right modules and script installations, delivering a site rich in functionality that also looks good can be done quickly. So, what about performance? With Varnish, Pressflow and a bit of thinking we can at least make our site blisteringly fast for visitors who are not logged in.
I hadn’t expected the speedup would be this dramatic – my laptop is now serving between 1000 requests per second (most time spent in the first cache miss from drupal) and 9000 requests per second (no cache miss), where it otherwise would fail on just 1000 requests with 100 concurrent ‘users’ (also running on the same machine).
In sharing the gotchas, our configuration and some useful sources I want to give something back for the countless blog and forum posts that helped me set things up.
(pertaining to Ubuntu 10.10 and Debian Lenny, Varnish 2.1.3 and Drupal 6.x)
Out of the box varnish comes with an empty configuration file and will cache nothing. The idea seems to be that you can verify that your site works as it did before, and then add your own vcl (varrnish configuration language) in small steps, verifying at each step that it still works.
For Drupal 6 you need to install pressflow and the varnish module, and make sure they work. I haven’t tried this with Drupal 7 yet.
The drupal varnish module currently needs a patch to work properly: patch: http://drupal.org/files/issues/808314.patch issue: http://drupal.org/node/808314
Debian lenny’s version of varnish is outdated, as varnish is relatively young and seems to be actively developed. I got varnish from their own apt repository : http://www.varnish-cache.org/installation/debian was very useful.
Test! A couple of times I thought it was working, but nothing was actually cached. My weapons of choice where ab (Apache Bench), which can fire off a large number of requests, also in paralel, curl -I which will show the headers that come back from the server, and some custom VCL instructions to add an additional http headers to show cache HIT or MISS. This is all about attention to detail.
Install on your own machine first, then on a server, doing it at least twice forces you to better understand how it works.
Benchmark with ab to get a baseline, see what difference it makes
Install varnish, not on port 80 at first (default is 6180), only switch with your regular webserver once you are sure enough that it works well.
Configure everything, benchmark again
Test functionality of your site; We will do things such as removing cookies, in my case that broke logins for registered users when I first tried it – VCL is a (domain specific) programming language and its power deserves to be treated with the same respect as your PHP code.
Celebrate! For anonymous users your site will now be a couple of orders of magnitude faster. In our case it also means that some parts that expose very slow backend systems now actually work, as opposed to giving errors all the time and bringing our server to a crawl (work to make the backend systems perform is in the pipeline – this is a stopgap solution of course).
Benchmark to establish a baseline
ab -c 100 -n 1000 <frontwebpage>
which gives you 100 parallel requests for a total of 1000. It might be that your server can not handle 100 parallel (-c) requests at first, so if your computer seems to hang (like mine did…), lower that at first.
Install varnish – the version in Ubuntu 10.10 is recent enough. On debian Lenny I used http://www.varnish-cache.org/installation/debian
Install Varnish module (I use drush for that), go to admin/settings/varnish. Settings:
flush page cache on cron -> off
check that port / hostname of varnish control terminal matches (is set to varnish default port 6082 on localhost)
Varnish cache clearing -> drupal default
add the authentication token as documented here http://www.varnish-cache.org/trac/wiki/CLI The token can be found in /etc/varnish/secret check the status test at the bottom, varnish control terminal should be accessible. If not: see what ports varnish has open with sudo netstat -ap | grep varnish. The port where varnish runs at is determined in /etc/default/varnish . This is what I needed the patch for. Other things you may want to verify: http://drupal.org/node/941788
Develop your own VCL. I took the one from four kitchens as a starting point: https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow and added a http header to indicate cache HIT or MISS I found elsewhere. I put my default.vcl on github https://gist.github.com/744643 for your inspiration.
Eventually, also change: /etc/apache2/ports.conf to: NameVirtualHost *:8000 and Listen 127.0.0.1:8000. Then edit all your virtual hosts configuration files, if they contain a port number.
http://blog.justinlintz.com/2010/08/configuring-varnish/ (also has comparison with varnish)
apache or another webserver will see 127.0.0.1 as ip adress of your visitor, add apt-get install libapache2-mod-rpaf so that logging in apache will show the correct ip .
I’m working on a puppet recipe to automate most of this stuff, maybe I’ll publish it when I’ve done some more installations like this, so far it’s been running well on my laptop and on a test server, but it still needs to see some heavy use. The fact that both drupal.org and varnish-cache.org (both heavy traffic sites) now use a similar setup and work smoothly does give me confidence though…
Next up is improving performance for logged-in users, there the most valuable options are a lot less clear to me.
I hope this helps you in getting a better performing drupal site!