IaC – The key tradeoffs when implementing Infrastructure as Code

IaC – The key tradeoffs when implementing Infrastructure as Code

What is IaC? 

As the name suggests, IaC or Infrastructure as Code is the practice of managing the infrastructure through plain text files (code). This management mainly consists of description and implementation. First, you need to be able to describe all infrastructure details in the trough code. And your tools and platforms need to be able to implement configuration changes based on said text files. 

While there are different approaches and tradeoffs the core concept is super simple. Instead of mucking about infrastructure, you write down or modify current infrastructure text files, and then some automated tool takes over. Magic happens and soon you’re left with an updated configuration based on some config files.

IaC is regarded as a core principle behind similar cloud-based solutions and patterns. One such example is GitOps, which is actually IaC based on Git Code versioning system.  Also Serverless heavily leverages the ability to describe everything in text or JSON form.

Key Infrastructure as Code tradeoffs

With IaC, you will often come across a few patterns. There is no one true way of going about implementing it, there are just different IaC approaches and tradeoffs. Keep that in mind when you start thinking about what approach fits your organization. 

plate of cake with chocolate frosting
Protip: once you present your cake it becomes immutable or unpresentable (aka. eaten)
Photo by Anna Tukhfatullina Food Photographer/Stylist on Pexels.com

Mutable vs Immutable infrastructure

In layman’s terms, once you deploy something do you change it or replace it. You will most likely end up with a combination of both, but it’s important to know the difference. Both fit specific use cases. And both have use cases in which forcing an approach makes them near impossible to maintain.

Mutable infrastructure

Most IT traditional sysadmin workflows are designed around mutable infrastructure. You have your infrastructure and when you want to change something you modify a file or you redeploy code. Its main benefit is that it’s simple to do by hand, or usually simple to automate. But you need tools and discipline in order to have all the changes consistent and documented. It’s quite common to end up with 20-year-old snowflakes that depend on multiple hacks in order to keep running. 

Immutable infrastructure

A more modern approach uses immutable infrastructure. You don’t modify resources, you throw the old away and create a new instance. In case you’re thinking “this can’t possibly work!”, this is how docker works, this is how kubernetes works, and a large chunk of cloud as well.  Take a look at docker-compose which is a textual way to describe how you want to run your docker containers.

This approach tends to give more reproducibility and control to individual deployments. You get the ability to reproduce each configuration of the system. This makes it much easier to roll back in case something goes wrong. It works wonders for some systems, especially stateless containerized web apps. But in some other, it might make it very difficult to implement. 

Some databases don’t work well with such approaches. You might need to export old and reimport into new, or restarting might take a prohibitively long time. Some applications might have special needs that are not easily given and released. Also, it might affect your development flow. Doing a change or tinkering with a system is much easier than thinking of a change, rolling the change, and waiting for the new instances to pop up. 

Declarative vs Imperative approach to IaC

imperative approach

Again the traditional way of implementing a change is telling the computer what to do step by step. This is what is called the imperative approach. Its main advantage is that you can easily see everything that will happen to the resource. Its main disadvantage is that you have to know what needs to be done to get to the desired change. And also you have to describe both the change and how to get to that change.

Declarative approach

In the declarative approach, you’re the king who issues decrees. Your minions (servers) have to worry about implementing the new state or die trying. Its main advantage is that you don’t have to know how to get to the end state, you just have to describe what the end state is. 

This works like magic if you mostly only change a few specific things. But more unique needs you have harder it get to implement. In the end, somebody has to implement the declaration implementer logic. With most platforms, you will get a bunch of stuff out of the box. But your services might have some special needs not supported by the platform. This means that you either have to get someone to implement everything not present. Or if your platform is proprietary, depend on the mercy of your provider to get the features you need. 

While this approach sounds very nice, you have to think of added overhead. Everything not already supported is bound to take longer to implement than with an imperative approach. This is why it might make sense to not bother with a few special snowflakes. It will keep you from needing the knowledge to change infrastructure. But it will not reduce the knowledge requried to maintain and expand the automation platform. It will work now, but if you get rid of all the people who know how it works in the background, …. You’ll have a fun time during the next major outage or when the platform stops deploying properly. 

Silver bullet

There are a few major selling points for IoC. While in some cases they make sense, in others they might end up hurting the project or even costing more than traditional system administration. They are just nice to keep in mind because while they do help in some cases, they are not universal truths. Also, some solutions on the market might be benefiting the seller considerably more than the end-user.

Operations costs Vs. Maintenance costs (+ adding new features) 

IaC will make your “standard” instances easy to deploy and trivialize the most common configuration and maintenance tasks. Assuming, that the change is something you planned for and that it’s something supported by your platform. Otherwise, you will have to implement it. Depending on your previous implementation choices, this will be easy or very hard. 

The automation or IoC platform of your choice will become a programming project for your organization. Which as any software requires programming maintenance or over time it will devolve to an unmaintainable moody spaghetti. You need a dedicated staff of DevOps engineers who have both sysadmin and programming backgrounds to maintain such a platform on large scale. Do not underestimate this complexity when trying to weigh IoC vs traditional administration for your organization. 

Confidently deploying changes  and reducing the potential for human error

Automation, in general, will reduce the chance that someone will mistype something or miss a step. You just have to keep in mind that human error is still there, just it moved to another layer of abstraction. Automation will prevent you from misconfiguring by hand. But you can still misconfigure when creating an instance or change specifications. And you can have instabilities and bugs in the automation platform. 

Reduced need for operations specialists

IaC code monkey
An ape debugging production DB

More automation, fewer code monkeys mindlessly typing on their keyboard, right? Well in some ways yes. But planning to automate administration just to cut the Sysadmin/DevOps team by hand is the wrong way to think about this. 

What you end up with are administrations who have more time to think, more time to listen, and more time to help make correct decisions. This can mean that they can further expand the scope of their work. That they can spend time understanding their stakeholders as well as more time on hand to dedicate to any outages and incidents in the infrastructure.

Also, don’t forget that all administration work requires occasional unexpected tweaks and maintenance tasks. A major risk with diluting the expertise of the current team is that you end up with an obscure issue that no employee is able to solve. A long service outage is very likely to cost you considerably more than any sort therm savings you get from letting experienced people go. 

Further reading on IaC

Back to top