Is DevOps and SRE same (Part 1) ?
In todays world , DevOps, SRE (Site Reliability Engineering), Agile, CI/CD and a variety of other software engineering techniques are all examples of Agile ways to do business. None of the elements in these engineering techniques are easily separable from each other, and this is essentially plays a continuum role in the whole ecosystem. But there are some of the ideas that shows differences in these practices and this is what is the part of this post.
You must have experienced this very well in your dev and Ops teams , “We want to push anything, any time, without any issues” versus “We don’t want to change anything in the system since it works”. Now let’s check where these philosophies overlap and where they differ. Let’s see first DevOps.
DevOps frameworks are defined as a loose set of practices, guidelines and ‘culture’ to break silos in IT dev, Ops, networking, security etc. One can use CALMS( Culture, Automation, Lean, Measurement, and Sharing) — is useful for remembering the key points of DevOps framework. Lets check the pillars of DevOps and how it stacks up against the SRE.
1) Shared Planning : Any DevOps frameworks works on shared planning principle. Shared planning promotes Agile planning and portfolio management tools / processes and how it helps to quickly plan, manage, and track work across the entire team. One can experience product backlog, sprint backlog, and task boards which can be used to track the flow of work during the course of an iteration.
2) Shared Codebase : Other principle of DevOps is shared codebase. DevOps systems support 2 different types of source control : GIT (distributed) or Centralised (like Perforce). In both distributed and Centralised systems, teams can check in files and organize files in folders, branches, and repositories. Teams can manage repos, branches, and other code development operations from these Repos.
3) Silos and Collaboration : No more Silos. Do away with the old fashioned structure of separating Ops and Dev teams. This means essentially siloed knowledge and no collaboration which is detrimental to IT and to the whole business.
4) Measurement : Measurement is crucial for overall business success ie each change should be measured to understand whether it successful or failed. What this means is in each of these environments, you identify the RCA by means of system measurement, verify that whats changing the situation and create a platform for conversations that different Devs and Ops agree upon. This means getting on to common call or on-call for collaboration and agreeing on the incident and its RCA.
5) Tooling : For managing change correctly , tooling is an important for DevOps. In current IT landscape , change management relies on highly specific tools. But the tooling could be same for DevOps and SRE , for example the Containers / microservies are used for scaling and CI/CD for gradual change. IaC ( Infra as Code) is to automate everything ie from deployment and configuration to migration
IaC: Reigning the Deployment Pipeline
Recently for a customer whose site experienced explosive growth over a small period of time , ie 10x times the traffic…
6) Change Management : The next pillar is a gradual change is best ie when the change is small and frequent. In some environments , the change committees may monthly to plan and make changes in Prod but some may meet daily . Change can break but to do a correct change and splitting up your changes into smaller chunks or batches is the trick. Idea is to build a pipeline of low-risk change from product, design, or infra changes. But this strategy, coupled with Automated testing of changes and Rollback mechanism leads successful change management and extends the concepts of CI and CD.
Change Management : Engagement Approaches
Since a large number of organizational change efforts fail to meet their desired objectives, a focus on how employees…
7) System Failures : The other pillar is that the system will fail at any point of time and its accidental which is acceptable in DevOps. . For example, a high load of more than 5 million users on system on “Black Friday” will bring down the system .Thus its likely that more systemic probes are used for monitoring such incidents and fail safe options like auto scalability are in place before that happens. In cloud environments its quite common to have IT resources being scaled up or down based on the load but things can always go south. The idea is to accept it and have the culture to fix it asap.
8) Culture : DevOps strongly emphasise organisational culture — rather than tooling — as the key to success in adopting this new way of working in DevOps . If there are rigid boundaries between “Dev” and “ops” (sometimes called programmers and operators) then they tend to be counterproductive. This is especially true if the responsibilities and classification of Ops as a cost center leads to folks not able to appreciate the work of both ie the Dev and the Ops.
9) Automation: Automation , at any layer is productive and sets a benchmark for improvement. Over time, a Dev team may automate all that it can for a service, leaving behind things that can’t be automated or to be automated by Ops team. Regardless of who does the automation, its productive to automate what ever can be and reduce any manual intervention in the Ops /Prod.
to be continued