3 Keys to an Effective Kubernetes Disaster Recovery Plan

The security of Kubernetes workloads is being put to the test. In Europe, IT teams have been dealing with simultaneous spikes in cyberattacks and extreme weather events, making it extremely difficult for them to keep data out of the wrong hands or even maintain uptime. Last summer, for instance, security researchers found that Kubernetes clusters were being attacked via misconfigured Argo Workflows instances. The vulnerability meant attackers could access sensitive information such as code and credentials or even access an open Argo dashboard and submit their own workflows. Meanwhile, in February, the UK and northern Europe dealt with Eunice, their worst storm in 30 years, which brought with it a record number of power outages.

Unfortunately, this is the new reality for organizations and their IT teams, and it’s all the more difficult because of the permanent remote work situation. Of course, working from home has been a godsend for many employees, and it’s proven to be a productivity booster. But it certainly creates extra technical complexities for IT teams managing service outages or downtime incidents. And considering 90% of containerized deployments are now happening on Kubernetes (including some of the most business-critical applications globally), even a minor outage could cause colossal financial and reputational damage for businesses.

AWS Builder Community Hub

For these reasons, having a plan to respond to downtime incidents quickly has become non-negotiable. Here are three key traits of an effective Kubernetes disaster recovery strategy.

1. Having a Clear Backup Location for Restored Data

Businesses need a restore plan in place before moving ahead with a backup. To ensure the seamless and speedy recovery of their Kubernetes clusters, organizations need to be clear from the offset about where their backups will be restored in the case of a downtime event. This task is much more challenging than it sounds, given the complexity of Kubernetes components.

The goal, however, is simple. Enterprises need the ability to quickly restore and migrate all application components wherever they want them and restore subsets of these applications when they need to. In an environment where the cost of downtime is multiplying (now roughly $250,000 per hour), any measure that improves both the recovery time objective and the recovery point objective is vital.

A recent Wanclouds study found that nearly two-thirds of businesses experienced data loss last year. This finding showcases just how urgently this issue needs fixing. According to the report, 31% of U.S. and UK businesses that lost data experienced downtime or the unavailability of cloud services for up to 10 hours. Meanwhile, nearly a fifth (17%) said they were offline for 10 to 15 hours. IT professionals at these businesses potentially forfeited millions in lost revenue and damages.

2. Deploying a Seamless Cloud-Native Approach

Every disaster recovery plan’s goal is to create a safety net for businesses to keep their applications, infrastructure and ultimately their business running in the case of an unexpected outage. But as the risk of downtime has increased in recent years, so has the realization that a traditional DR plan is riddled with too many inefficiencies for this modern IT landscape, especially with backing up Kubernetes applications.

Traditional disaster recovery is not built for containers. In truth, it’s far too complex, expensive and unpredictable to be relied upon. Legacy approaches work by creating a parallel production setup that might not even be required in every case. It also only backs up specific resources and objects, resulting in long recovery times during disaster situations. Moreover, it doesn’t allow for application mobility with all its constructs and blueprints like network setup, security policies, configurations and data across cloud regions or sometimes even clouds. The ability to capture an application as a whole is of course, crucial for K8s since they are application-centric.

All this means is that any IT team that deploys a traditional DR plan for their Kubernetes is putting their organization at a greater risk for data loss or corruption. Instead, they need a cloud-native backup strategy that allows them to back up from situations such as application misconfigurations or malicious attacks like ransomware. Cloud-native DR and backup solutions are designed to handle the vast amounts of components found in large clusters and need to recognize the relationships between applications and data. To address these issues, many companies are utilizing cloud-based disaster recovery-as-a-service (DRaaS), given its simplicity, flexibility and how it reduces the financial investment they need to make. Analysts predict that the global market for DRaaS will grow by 35% over the next five years.

Other cloud companies are addressing Kubernetes data resiliency by offering innovative software solutions that ensure containers can be protected across the growing reliance on hybrid and multi-cloud environments. For instance, Red Hat added data resilience capabilities for Kubernetes with the release of Red Hat OpenShift Container Storage 4.6. It offers customers the ability to extend their existing data protection solutions and infrastructure to enhance data resilience for cloud-native workloads across hybrid and multi-cloud environments.

3. Layering in Security to Your DR Plan

Businesses and government agencies across Europe are under siege by cyberattackers. Officials are increasingly apprehensive about Russian ransomware gangs’ threat to their respective country’s critical infrastructure as EU leaders continue to stiffen sanctions. For example, one such attack, which targeted the U.S. satellite communications company Viasat was felt across central and eastern Europe as it triggered satellite service outages.

Keeping track of permissions and credentials is a task in itself and as we know, a significant security undertaking. To put it frankly, organizations’ workloads are more vulnerable than ever.

Kubernetes clusters, in particular, are often abused in compromises that exploit their misconfigurations. They also tend to be multi-tenant, with developer teams regularly being added and removed from systems, which makes securing them even more complex.

That is why there’s an urgent need for enterprises to factor security into their K8s management. The good news is that Kubernetes already has built-in security features like network policies that protect internal application components and data services. The bad news is that they sometimes stop backup solutions from working outside Kubernetes clusters. A cloud-based disaster recovery solution solves this problem, and the even better news is that some are even adding ransomware detection capabilities as an additional security layer.

Another good resource is the Cybersecurity and Infrastructure Security Agency (CISA) security guidelines for Kubernetes, highlighting the need for proactive breach prevention measures like Kubernetes pod security, network separation and hardening and authentication and authorization.

IT teams across Europe realize the criticality of having a simple and effective Kubernetes disaster recovery plan. As they count on K8s to store their most critical business applications, they know that an effective Kubernetes DR strategy could be the iron gate that shields their entire organization and their customers from a crushing downtime incident.


Join us for KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain (and virtual) from May 16-20—the first in-person European event in three years!

Avatar photo

Faiz Khan

Prior to founding Wanclouds, Faiz was an executive at Cisco and played multiple technology leadership roles. His latest assignment was leading the Global Cloud automation and orchestration organization. Prior to that, he has built the Global Datacenter and cloud practice and was the GM for Emerging Markets Technology Practices Organization. Faiz has an MBA in Computer Information Systems from Colorado State University.

faiz-khan has 1 posts and counting.See all posts by faiz-khan