Companies, even large ones, haven’t considered disaster recovery plans outside of their primary cloud providers own infrastructure as regularly as they should. In March of this year, Amazon Web Services (AWS) had a massive failure which directly impacted some of the world’s largest brands, taking them offline for several hours. In this case, it was not a malicious attack but the end result was the same— an outage.
When the organization’s leadership questioned their IT departments on how this outage could happen, most received an answer that was somehow acceptable: It was AWS. Amazon failed, not us. However, that answer should not be acceptable.
AWS implies they are invulnerable, but the people running IT departments are running it for a reason. They are meant to be skeptics, and it is their job to build redundancies that protect the system against any one point of failure. Some of those companies use AWS disaster recovery services, but if the data center and all the technology required to turn those fail-safes on crashes, then you’re down. This is why we need to treat the problem with the same logic that we use for any other system. Today it is easier than ever to create a resilient DoS resistant architecture that not only takes traditional malicious activity into account but also critical business failures. The solution isn’t purely technical either, it needs to be based upon sound business principles using readily available technology.
In the past enterprise disaster recovery architecture revolved around having a fully operational secondary location. If we wanted true resiliency that was the only option. Today although that can still be one of the foundation pillars to your approach it doesn’t have to be the only answer. You need to be more circumspect about what your requirements are and choose the right solution for each environment/problem. For example:
- A) You can still build it either in your own data center or in a cloud (match the performance requirements to a business value equation).
- B) Several ‘Backups-as-a-Service’ will offer more than just storage in the cloud. They offer resources for rent (servers to run your corporate environments in case of an outage). If your business can sustain an environment going down just long enough to turn it back on (several hours), this can be a very cost-effective solution.
- C) For non-critical items, rely on the cloud provider you currently use to provide near-time failure protection.
The Bottom Line
Regardless of which approach you take, even if everything works flawlessly, you still need to address the ‘brownout’ phenomenon or the time it takes for services to be restored at the primary or to a secondary location. It is even more important to automatically send people to a different location if performance is impaired. Several people have heard of GSLB, and while many use it today, it is not part of their comprehensive DoS approach. But it should be. If your goal with your DDoS mitigation solution is to ensure an uninterrupted service in addition to meeting your approved performance SLA; then dynamic GSLB or infrastructure based performance load balancing has to be an integral part of any design.
We can deploy this technology purely defensively, as we have traditionally done with all DoS investments or we change the paradigm and deploy the technology to help us exceed expectations. This allows us to give each individual user the best experience possible. Radware’s dynamic performance-based route optimization solution (GSLB) allows us to offer a unique customer experience to each and every user regardless of where they are coming from, how they access the environment or what they are trying to do. This same technology allows us to reroute users in the event of a DoS event that takes down an entire site be it from malicious behavior, hardware failure or simple human error. This functionality can be procured as a product or a service as it is environment/cloud agnostic and relatively simple to deploy. It is not labor intensive and may be the least expensive part of an enterprise DOS architecture.
What we can conclude is that any company that blames the cloud provider for a down site in the future should be asked the hard questions because solving this problem is easier today than ever before.
Read “Radware’s 2018 Web Application Security Report” to learn more.
Daniel Lakier is VP-ADC globally for Radware. Daniel has been in the greater technology industry for over 20 years. During that time he has worked in multiple verticals including the energy, manufacturing and healthcare sectors. Daniel enjoys new challenges and as such has enjoyed several different roles in his Career from hands on engineering to architecture and Sales.
At heart Daniel is a teacher and a student. He is forever learning and truly has passion for sharing his knowledge. Most recently Daniel left his role as President and CTO of a leading technology integrator where he had spent the better part of 8 years to join the Radware organization.
When Daniel isn’t at the office he enjoys working on the farm and chasing his wonderful daughters.
*** This is a Security Bloggers Network syndicated blog from Radware Blog authored by Daniel Lakier. Read the original post at: https://blog.radware.com/security/2018/10/disaster-recovery-data-center-or-host-infrastructure-reroute/