NoOps is a trend trying to push out dedicated operations people out of the organization. Its name comes from “NO Operations”, with operations being a synonym for system administration. To better understand what going on, let’s first take a look at what regular Sysadmin does.
What is Ops?
Operations or system administration is the generic term for the computer experts who set up and maintain all hardware, and most of the software in the office or company. Often they have a deep understanding of the systems they maintain. As often hidden undocumented quirks pop up, and someone has to fix them.
Stereotypical look at the system administrator
While operators differ by skills and personality, one thing will always be true for every system administrator. They care deeply about maintaining all systems in working condition. This can create tension in cases when admins perceive that pushing code can harm the stability of the system. And this lens of “how it affects the systems” has led to some stereotypical, even humorous depictions of a sysadmin. To get the feel, check out the day in the life of a sysadmin short story.
Such stereotypes are not always incorrect, as admins tend to have a lot of leverage and access. Partially because they tend to be the people with all login data. And partially because of the knowledge concerning the systems in question. The crazy administrator theme is explored by The Bastard Operator From Hell short-story series.
Complications inherent in the proffesion
Jokes aside, they often end up setting up production-ready systems or modifying existing systems for reliability, performance, stability… And they especially shine when something goes wrong. Hardware failure, bugs, misconfiguration, even cosmic rays can stop or corrupt a system. And very likely your administrator will be the guy who will have to fix or mitigate all issues. At times there will have nothing to do, at times they will spend days in a datacenter debugging some obscure incompatibility or figuring out which of the 24 RAM slots failed.
Anyhow the gist is, Operators are actually important for organizations. They might not always be necessary, but you should think of them as a reserve. The reserve always ready to jump in when something goes wrong. And with always On systems, it’s a question of “when it goes wrong” and not of “if it goes wrong”. For further reading check this amazing post on what Ops does from charity.wtf
The promise of NoOps
There are a few approaches to NoOps, we will now compare them using mostly their selling points. Later we will do our silver bullet analysis on top of them. In no particular order they are:
- Literally No Ops: no one is taking care of operations, you rent cloud services and everything is happily running in a cloud somewhere
- No Dedicated Ops: developers are also in charge of system administration
- DevOps is the new Ops: your DevOps engineers are taking over all operations (or vice versa)
- Yes Ops but it’s all automated: Your sysadmin doesn’t ever touch or administrate a system, he works only through automation.
The ultimate NoOps goal, probably unachievable but still people strive towards it. You have no operations stuff and nobody in your company takes care of it. Everybody works directly on the goals of the company and nothing ever goes offline because you’re paying a cloud service to take care of everything. You pay to not care and do not have to find and retain the talent required to administrate systems.
No Dedicated Ops
No one is the dedicated system administrator, your developers and staff take over these roles. This works great in the beginnings of small startups but it’s unlikely to continue as the company operational load increases. This is similar when you compare the cost structure of a startup with an overhead of a mature company. There is a difference between making it work and making it work for millions of users on a global scale. Misjudging the scale of the difference may cost you a lot of time and resources.
DevOps is the new Ops
Assuming you have the DevOps engineers with extensive sysadmin background, this can work. Except, that they won’t be able to wear both hats at the same time. DevOps requires close work with developers in order to understand and satisfy their needs. Sysadmins, especially during an outage, require a dark room with even darker coffee. With a few days of peace and quiet while he tries to wrestle console and documentation while fighting all the calls for “Is it fixed yet?” from all the people who only happen to remember he exists when something goes down.
Yes Ops but it’s all automated
This one is technically possible, although the time crunch during outages might force your team to do a quick&dirty fix before implementing a more proper lasting solution through automation. We have written a few articles on GitOps, IaC, and IaaS that focus on creating such a system.
Silver Bullets promised by NoOps
Now we will point out the potential issues to the NoOps benefits as described above. For some, they may be actually benefits and may be worth the cost. For others, their tradeoffs may just too costly to implement. Only you know your situation, so don’t take a selling point for granted just because it benefited somebody else.
NoOps will Saving cost
Everything and everyone claims that it saves cost. Do the math to figure out if it can save cost in your case. With no one oiling the machine, it may start slowing down and even breaking apart. Also, keep in mind that a few major outages can destroy the trust in your service. Lack of trust means that users will start looking for alternatives.
No need to hire and retain sysadmins
While this is to some degree true, you can only outsource so much. At some point, you will still need to retain personnel capable of administrating all the outsourcing, licenses, and communications required to replace on-premise administrators. Such a setup can also leave you with provider lock-in. They can be costly to leave or may leave you vulnerable to the health and ability of providers in question.
The cloud will take care of it
Yes, it will, but you will have to take care of the proper provisioning of the cloud. On large scale, you can pay a six-figure salary to the guy who just optimizes your cloud infrastructure costs.
Contrary to the name, there is still a server somewhere deep down. It can work like a charm in certain cases. But often its constraints make it impossible to use as a platform. Check this article to learn more.
Development and administration can’t be that different
There is a misconception that all “computer people” are similar and interchangeable. As with most human disciplines, there are no actual berries to learning any of the skills required. The question is do they want to learn that and do they want to do that kind of job.
With programming, you try to have specific tasks and to write code to achieve said tasks with given constraints. Changes get near-instantaneous feedback. Your sense of accomplishment gets tied to the difficulty and intellectual challenge related to completing the task.
With administration, the setting up of systems is similar; but the end goal has to do with keeping said systems working properly. Most people take “everything is fine” for granted, but it actually requires continuous work. When something goes wrong, you have to fix it and you put a lot of effort into understanding and debugging problems. You might feel proud at the end of such an ordeal, but for most others, you just returned the system to “Default online state”. Not to mention that most people assume that the person fixing the problem is also the person responsible for the problem happening in the first place. Only the paranoid administrator fully understands that entropy works in mysterious ways.
If this article appears like a joke to you, that is because your perception matches the reality. I had a hard time writing seriously about a topic that I perceive as a joke. If you’re concerned with your sysadmins being just dead weight, try to find them something to do while they wait for the next outage to hit. But don’t fire them, you end up with a small short term cost saving and a large added cost when something goes wrong in your datacenter.