Archive for February, 2011

Server login considered harmful

Thursday, February 3rd, 2011

Why limit continuous improvement to planning or programming? Inspired by the DevOps movement and a new breed of server configuration management tools, Stephan Eggermont (playing Dev) and yours truly (playing Ops) set out to see if automating the hell out of a server build would benefit us. We learnt a few things. One of them: each time we logged in to the server was a system administration smell. Something in our approach to automation had failed. Time to ask why a couple of times…

We constructed a web and mail server for an early stage startup. Stephan wrote the application for it together with Diego Lont, and they were looking for a smart way to deploy it. I had looked into automating systems administration tasks with scripting tools such as puppet and chef before, had tried puppet a year before and failed to earn back the time I put in.

This time our goal was to be able to build a new server quickly, because the startup did not have the funds for more expensive disaster recovery options. We also wanted to not think separately about writing documentation: if left as a a separate task, documentation is often just left.

Stephan was somewhat new to linux systems administration and wanted to learn, I have done linux administration on the side for almost ten years now, but many things I still don’t really understand (e-mail, firewalls), even though I administer a couple of servers that seem to be running ok.

We hoped that documenting ourselves in the form of scripts would help us understand, and that writing these scripts in a Domain Specific Language designed to describe servers (puppet in this case, but it could have easily been another tool) would force us to be precise in describing our understanding.

So why did we need to log in?

Our web server would not start, but puppet did not pass the output of the web server (lighttpd) on to us.

Why did our web server not start?

We made errors writing its configuration file.

Why did we make errors in writing the configuration file?

We used lighttpd, a web server I hadn’t used in a couple of years. Its configuration is not that hard, but you have to use exactly the right lines. Especially with secure sockets (a bit tricky on most servers) this turned out to be harder than we thought.

Why…

We don’t have automated tests yet for the configuration, Stephan and Diego’s software has automated tests on the inside, but the mail and web server stuff does not have automated tests. I have a virtual machine that is, with the puppet scripts, roughly a copy of the production machine, but learning puppet and figuring out how to automate tests for the environment was too big a step.

There are many more whys. Sometimes we were in a hurry, so we didn’t sit together to think through the next step, do it, and reflect on it. Sometimes we still did things directly on the server, because we could not figure out how to quickly do it in puppet. The first steps in automation required a lot more patience than I expected. The object oriented database used does not come with a ubuntu package, but needed to be installed with a couple of shell scripts that required manual intervention. We did not have the courage or understanding to rewrite the scripts as (automated) puppet recipes or an APT package.

It didn’t occur to us at first that logging in was a smell. We just started to notice after a while, that whenever we logged in to the production server, there was something we had overlooked. We broke something, or could not see whether something was running properly or not. Sometimes we took action to prevent having to log in the next time, sometimes we thought it was too much work for now, and accepted that our automation is not yet ‘perfect’.

If you made it this far, congratulations! You may want to hear more. If so, join us this Sunday at 15:00 In the configuration management devroom during FOSDEM 2011 in Brussels (no registration required). We’ll be talking about configuration management for developers. Learn from our mistakes, so you can make different ones ;)

Thanks to Stephan for coming up with the first title. And Marc Evers for shaping the second one.