What is GitOps?
GitOps is version control and continuous delivery for your infrastructure. Sounds simple, but those fancy words hide a lot of complexity.
Version control is a simple one, you have text files, and every time you change them you have a new version, and you can always see the old version or revert to it.
The “Continuous delivery” part is slightly more complex. First, it assumes you described your infrastructure is in a text format. This concept is usually referred to as Infrastructure as Code or IaC. These text files are declarative, meaning that they declare what the final state will look like. A tool takes those files and makes appropriate changes. The tools in question already know how to get from the current state to the desired state. Continuous delivery then becomes simply running those tools with the newest version of all IaC description files.
GitOps is the logical next step for a DevOps coming from a programmer background. It comes from the need to logically trace infrastructure changes in a way that is similar to actual code. It makes sense because deciding what the infrastructure should look like is separate from thinking about how to get it there. And the fact that ‘getting there” can be automated, is just the cherry on top.
Key benefits of GitOps
If we contrast GitOps with traditional old-school system administration we will find a lot of benefits and a lot of tradeoffs. But a lot of those come from a subsegment of GitOps that can be implemented without implementing the whole thing. So if your organization needs only a particular perk, you could save time and resources and just integrate into your ecosystem only something that solves your particular problem.
Below is the list of commonly cited benefits of GitOps. Just keep in mind that in case you need one of them, there are probably simpler ways of implementing just that part.
Ability to minutely describe the whole system
It’s very common, especially for legacy systems, that only a few people know how they were originally set up. Proper server setup tends to be quite reliable, and you can end up with a service running for 10 years without being touched by anyone. This is probably not your main production. It’s some godforsaken service set up years ago and used solely by one nice old lady in accounting.
If you’re lucky, the system is documented and documentation was kept up to date. With a more mature organization, chances are that there are many such little services hidden all over the datacenter. Describing all of them is quite an effort, but the alternative is to literary your business depending on a single person to maintain parts of the functionality.
Machine and human-readable infrastructure descriptions are quite powerful. It allows for much more data analysis as well as serving as the basis of further automation. Many IT departments would benefit from setting auto-documentation of systems. It might be a case that this becomes the first stepping stone in your GitOps journey. Also depending on your needs, complete system documentation might be all your organization needs.
Confidently deploying changes
Deploying changes to the infrastructure should be as easy as a pie. The actual end goal is that the user is only thinking of needs and contains. Everything else is taken care of by the platform. Of course, your platform has to be able to implement your desired change. But you only have to “teach it” once. Granted, it may take longer to change once or twice by hand, but then automation makes it faster and hands-off. As with all nice things in life, there is a tradeoff between doing the work and maintaining the automation.
Ultimately, your developers can take the responsibility for managing their systems. In order to change infrastructure, you need to know what you need, and what to change to get it. If the automated system takes over how and where considerably more people are able to implement changes [at least in theory 🙂 workplace terms and traditions apply].
Automating changes in many instances
To roll changes to a whole datacenter, or to a hundred cloud VMs, you have to have automation. Sure you could make your whole team repeat the same few steps in every instance. But you end up both wasting your experts on menial labor and making them hate their job a bit more.
For changes involving more than a few servers, automation is the difference between an expert working for a few hours and a team of experts taking a whole week.
Scaling infrastructure without major scaling of Ops/DevOps team
Sysadmins tend to get distracted by constant crashes and outages that pop up in the data center. At a certain scale such distractions inevitable. With automation and some other fancy IT buzzwords, they can become much more manageable. Heck, they can even become self-healing to some degree.
There is only so much your team can do. The system administration is in general a field of very slow steady improvements. This is mainly due to the effort and research required to achieve high reliability. Also, obscure badly documented flaky software is part of it, but mostly achieving high reliability.
The more you automate, the more your team will be able to work on new stuff and on firefighting. The things that they do most often, preferably the ones with a lot of waiting, should be picked first.
Reducing the potential for human error
Traditional administration requires quite a bit of typing. A wrong command at the wrong terminal can do quite a bit of damage. While I will be first to admit that yes indeed I check what I type three times, and NO it never happened to me. The magic smoke left my servers only a few times while I was tinkering around them!
Anyhow, where was I. A yes, to work is to make mistakes. The more you work, the more mistakes you make. Automation and config validations mean that you save yourself from stupid mistakes. The big catastrophic ones will still be there but at least the small stupid ones won’t. And let’s face it it’s the day-to-day that grinds us down and not the big thighs.
Infrastructure Standardization
While I’m a big proponent of custom-tailored solutions that exactly fit the needs, everything else should be identical. For everything not “special”, one OS, few core/ram/disk combos, and standardized networking. No matter what else you do having standardized boxes makes life easier to organize (counts for life as well).
Silver Bullet
Now for the fun part that will attempt to remove selling points from benefits 😀
Automation
Automation is a simple tradeoff of investing resources now in order to save resources later. As long. If you are big enough or expect to grow, automating whole or part of your setup makes sense.
But keep in mind that I’m talking about automation, not GitOps. All automation is good, as long as you’re automating the things you actually use. Some other non-git based might be better for you. GitOps is Infrastructure as Code + version control + auto triggering. Your organization might not need or not use the last two.
Infrastructure Visibility
Again this is a self-standing block. The only reasons for not having internal infrastructure visibility:
- that you don’t trust the people who are administrating it
- you (they) are lazy and don’t bother doing it.
In the first case, you’re not trusting people who have full access to your infrastructure. So you either have a big problem or trust issues. In the second case, make them (in a nice way).
Saving Costs
Everything and everyone claims that it saves cost. Do the math to figure out if it can save cost in your case. Keep in mind that licensing, setup, and maintenance is also part of the cost.
Risk and Human Error
For risk do the good old risk analysis. Mitigating and reducing risks that matter to your organization is better than jumping at any risk you see. As for the human error, you’re merely shifting it to the implemented platform. It’s a fun tradeoff of a single error messing up a change versus a single error messing up all future changes until fixed.