Canary Deployments

Canary Deployments

The canaries in the British mines used to detect toxic gases before they could do any damage to the health of the miners. The canary analysis serves a similar purpose in the deployment of software. A DevOps engineer makes use of Canary deployment analysis to determine if the release of a CI/CD will cause any problems to a business. Canary deployment analysis is the technique in which risk involved with the introduction of an update of the software is reduced in the production stage by a slow rollout of changes to a small user-base before it is made available to all users. Canary deployment is used widely to lower the risks involved in the movement of changes in production and reduce the need for additional infrastructure. Organisations that make use of the Canary deployment strategy for testing updates and new releases in a live environment for production will not expose their users to the release unless they are satisfied with it.

When Is Canary Deployment Useful?

When a company involved in the production of a product sells it online, they cannot afford even one moment of failure in the application, because even one moment of failure can mean a huge loss of revenue as well as irate customers. But the business may also need to make constant updates for enhancing the software and services to gain the best customer experience. But every change arrives with its own challenges as well as risks. In order to reduce the risk involved during the updation of the application, the Canary deployment strategy is used. 

Process Of Canary Deployment 

A canary deployment process takes place in three steps

Planning and Creating

First is the creation of the new canary infrastructure where the latest update will be deployed. A little traffic is sent to the instance of the Canary when a large number of users are using the baseline instance. 

Analysing

When the traffic has been diverted to that instance of the Canary, the team will begin the collection of the data, information and metrics from the monitors of the network traffic, along with the results from the synthetic monitors of the transaction. These are analysed to help determine if the new instance of the Canary is meeting the expectations while operating. The team will then compare the gathered data with the data of the baseline version. 

Rolling

Once the analysis has been completed, the team can decide if they want to move ahead with the release of the update and make it available to all the users or roll it back to the previous version. 

Benefits of Canary Deployment

Canary deployment helps expose potential risk with the help of a small number of users, thus leading to a reduction of production errors. 

  • The downtime production is faster. After smoke testing, sanity testing, and capacity testing, if the update or the new version is not good, it can easily be taken back. 
  • If there is an error, the traffic can simply be routed back to the baseline version and the error can be rectified. The team can then work to find the cause of the errors, correct it and introduce the update. The situation can be turned back to normal quickly and easily when any error is detected. 
  • Since canary deployment works with a small number of users, only a little amount of infrastructure is required. The blue-green strategy involves new product infrastructure being used for deployment. Canary deployment only needs infrastructure for the first stages of the deployment to check if the update is good. 
  • Canary deployment offers the business flexibility to experiment with new features. Since the impact of the testing on the experience of the users as well as the infrastructure of the organisation is minimal, developers will have more confidence in innovating. 
  • The use of Canary deployment leads to the problem where the new instance will have to be examined with the previous version. Comparison of the two versions on a scale big or small can lead to some inaccuracies. Canary can be divided into two parts for performing AB tests for determining the stability of the release. 
  • Canary is effective regardless of the size of the deployment. A canary deployment can take minutes to hours for completing. This makes it effective to use for frequent updation and fast updates. The deployment cycle is short which benefits the organisation as the time of release is reduced and the customers are given the high-value product quickly. 

Canary deployment is good for updates that are needed quickly. It works well for distributed large systems as well. If an organisation is highly distributed with clients across the globe, the canary will offer flexibility to make differences in the updates on the basis of the risk assessment of that region. 

Limitations Of Implementing Canary Deployments

  • Canary deployment can take time and without automation, it can be prone to errors. Companies are executing the phase of the analysis of the Canary deployment in a manner that is not integrated. An engineer will have to collect and monitor the data and logs from the Canary deployment version of the update and analyse it manually. This is a time-consuming process and cannot be scaled to match with the rapid deployment of CI/CD processes. If the analysis is not accurate, the wrong decision can be taken to either release or not release an update. 
  • On-premise applications can be difficult to update. If an application is in an environment where it is installed on personal devices, it can be a challenge for businesses to perform the Canary deployment. One way around this problem can be to set up an auto-update for the users. 
  • It can be tricky to implement. The management of different versions of the application with Canary deployment is easy, but the management of databases will require some challenges to be overcome. When a modification is tried in the application to be able to interact with the database or to make a change in the database, the deployment process becomes complex.  
  • To perform a Canary deployment, changes have to be made to the database schema for supporting more than one instance of the application. This way, the new, as well as the old version of the application, will be able to run simultaneously. When the new architecture of the database has been placed properly, the new version can be switched as well as deployed. 

Why Use an Iterative Canarying Approach?

Another approach that can be taken for best SRE practises, is where the features are flagged by the iterative and the Canary rollouts. In this system, the new release is broken down into parts and flagged on the basis of the features each part is implementing. An increasing number of users are given access to the new features. This method avoids big changes being made with big groups thus reducing the chances of big problems and a process that is reliable. 

Advantages Of  Iterative Canarying Release

Canary deployment and iteration are used in traditional releases of big applications. The code is deployed needs to be checked and the things that can compromise the new system should be flagged. The team will also work on recognising tag groups of users. In the end, instead of one release that is big, you will essentially be doing numerous small releases where more and more groups of users will be receiving the features with every release. The features should be flagged and the users’ groups should be created as it can cause overhead with each release and requires additional time. Even with the additional work and the overhead, this system of release has benefits:

  • The services released through this method are more reliable. There can be changes encountered during the production process, it is common to encounter some outages and incidents. With deployment done in iterative small steps, you will be ensured that only a small set of the services. Canary deployment means a small set of users will be affected. By taking care of prevention against big outages, you will be able to improve the reliability of the service to the large customer base. 
  • By following the iterative method, you will have more opportunities to get feedback from the users. You will also effectively be able to determine the improvements that need to be made before the changes are committed to the entire application. 
  • With the reduction of the scope and spacing of each of the releases, you will have the team of operation get the opportunity to ensure that they have enough support for each feature, making each of the operations more manageable. 

This approach can have some challenges as well, it is only ideal for deployment of coded programs, where you do not have to actively change the code for flagging the features and modules. You may have to incorporate new practices while building your code. A developer will have to invest their time and effort in building the new habits for coding. They will also have to develop a mindset that everything needs to be modular as well as iterative. Switching patterns can lead to slow development in the beginning but it will lead to better releases in the long run. 

Safely Balancing Canarying and Iteration

This method of deployment has two important things, the number of users who will have access to the new features, and the new features that will be released in the next iteration. The goal is the expansion of both these parameters eventually with each release until all users have access to all features. The leap made with each iteration should not be so big that it jeopardises the reliability of each change. That means finding a balance. 

Build a roadmap of how the production, the development and the operations team will share information. The plan should contain the outline of what features will be included with each iteration and what number of customers will be receiving them. But this timeline should not be set in stone, you should make room for adjustments later as the speed of rollout can change on the basis of how each of the iterations performs. 

Phased Rollout Approach

It is important to be monitoring data with the fatiguing of the features. You should have the ability to see the performance of each and every feature individually. Whether any feature should be incorporated or not should be taken into consideration on the basis of the performance of features as well as the overall health of the system. Monitoring tools can be used for parsing information of the system output. 

This will enable you to identify the features that are causing issues and are unreliable, and which features are stable and can be built upon. Once an iteration becomes safe, you can give it to the groups for whom it is designated. The setup is modular, and gives a layer of protection to the application, as individual features can be decided upon on the basis of the impact they will have on the entire system. This does not mean that no challenges would occur with this system.

Leave a Comment