Well, now that we have a high availability and low latency design for our RESTful services, where can we find the weakest link in maintaining our service level agreements? The answer often lies in the crufty, hidden, and often poorly tested “glue code” lying between (and below) the systems themselves. This code rears it’s head when a critical notification fails to fire, or a cluster fails to scale automatically as designed. How do we get this code cleaned up, out in the open, and rock solid? The answer, of course, is to apply agile principles to our infrastructure.
The field of system programming desperately needs an update into the Agile Age! And why not? Popular cloud management APIs provide complete command and control via Agile-enabled languages such as Java… And many efforts are under way to make cloud APIs standardized and open source. So let’s delve into some details about how we can transform shell script spaghetti into something more workable!
Automated Build and Deployment
System programs, like application programs, should follow the same automated build and deployment process as the rest of the application portfolio.
Automated Unit Test and Mocking
I long to see the day that system programs have automated unit test coverage metrics just like the rest of the civilized world. Let’s test that fail over logic in a nice safe place where it can’t ruin lives! Think you can’t mock all failure scenarios? Think again. With 100% Java APIs available from many cloud providers (not to mention other test-friendly languages) we could impersonate any type of response from the cloud services. Friends don’t let friends code without testing.
Change Management Tied to Code Commits
Let’s all admit it: at one point or another we have all been tempted to cheat and “tweak” things in production. Often it doesn’t seem possible that a particular change could cause a problem. However, in some system programming environments “tweaking” has become a way of life because people assume there is no way to replicate the production environment. In a cloud environment this is simply not the case. All cloud instances follow a standardized set of options and configurations commanded by a testable language. If we create a Java program or Python script to build and maintain the production environment, how trivial is it to re-create the same environment? In this case tying all system program code commits directly to a Jira issue becomes possible, and complete end-to-end management of environment change also becomes possible. All changes are code changes. Wow, what a concept!
By building testable components we create layers of abstraction which hide and manage complexity. We can now confidently build intelligent instances which dynamically configure themselves, hand in hand with the software which runs on them. Testability makes this not only feasible, but in fact imperative as a competitive advantage.
Organic Cloud Systems
So what happens when we standardize an Agile process for system programs in the same way we do for application programs? Repeatable results… And we also largely take the burden of busy work off of infrastructure folks so they can focus on engineering the next generation of great systems rather than doing so much fire fighting at 3 am. Not to say this solves all of those problems, but most repeatable IT problems have automatable solutions. Most likely we can automate anything which we can put in a run sheet using cloud APIs. We then have organic cloud systems which automatically configure themselves and adapt to environmental changes and failure scenarios without sacrificing reliability.